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SECTION  1 





INTRODUCTION  AND  SCOPE 


This  report  presents  the  results  of  the  investigations,  analysis,  custom  LSI  development. 
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breadboard  design  and  operation,  and  supporting  tasks  on  contract  F33615-73-C-1089  '1,S1 


Electronically  Programmable  Ix)gic  Arrays.  "* 


The  main  objective  of  this  investigation  was  basically,  "the  realization  of  parallel  array 
architectures  utilizing  LSI,  distributed  logic/memory  circuits  with  the  capability  of  being 
programmed  or  trained  to  perform  a variety  of  signal  processing  functions." 

The  following  expressions  are  defined  before  proceeding  with  the  introductory  material. 

Configurable  Polynomial  Array  (CPA)  — A 2-dimensional  array  or  network  of  elements 
capable  of  being  reconfigured  (or  reconnected)  in  a variety  of  ways  to  describe 
arbitrarily  complex  input-output  relationships. 

Element  — A building  block  (of  a CPA)  capable  of  implementing  a family  of  multinomial 
expressions  of  its  inputs  with  varying  coefficients  or  weights. 

The  goals  of  such  elements  and  the  array  in  which  they  can  be  imbedded  include: 

• the  ability  to  perform  a comprehensive  family  of  signal  processing  functions  at 
higher  speeds  than  is  possible  with  conventional  serial  computers  (using  the  same 
basic  technologies)  by  virtue  of  the  inherent  parallelism  and  structure  of  the  CPA 
concept 

• the  ability  to  alter  the  processing  function  very  easily  and  rapidly 

• the  ability  to  substanially  simplify  overall  software  requirements  in  the  application 
of  these  techniques  to  signal  processing  functions  compared  to  conventional  com- 
puter methods 

• the  ability  to  efficiently  Implement  the  complex,  non-Jinear,  multinomial  expres- 
sions associated  with  Adaptive  Leaning  Networks  (AIJ^'s)  as  well  as  more  conven- 
tional linear  transformations  (such  as  FFT) 


*A  more  descriptive  title  for  the  effort,  used  throughout  this  report,  is  "Configurable  Poly- 
nomial Arrays,  " (CPA's).  While  the  arrays  are  programmable  and  were  specifically  con- 
ceived to  embody  LSI  technology,  they  are  not  to  be  confused  with  the  "ROM-like"  program- 
mable chips  used  to  S3mthesize  Boolean  expressions. 
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• the  ability  to  be  used  with  training  algorithms  in  the  synthesis  of  the  networks  or 
arrays  for  a variety  of  problems  associated  with  pattern  recognition,  classification, 
or  discrimination,  modeling  and  identification. 

The  basic  tasks  which  were  addressed,  in  this  investigation  included: 

1)  A study  of  appropriate  architectures,  using  state-of-the-art  technology,  necessary 
to  efficiently  implement  the  elements  for  use  in  CPA's.  This  major  task  area  re- 
quired detailed  functional  designs  for  estimating  the  performance  of  these  elements, 
tradeoff  analysis  to  compare  alternate  approaches  as  a function  of  calculating  pre- 
cisions and  identification  of  LSI  for  realization  of  useful  elements 

2)  The  design  and  fabrication  of  a custom  LSI  circuit  for  use  in  a variety  of  versatile 
elements 

3)  The  design,  construction  and  demonstration  of  a brassboard  element  using  the 
special  custom  chips  as  well  as  other  existing  developmental  LSI  and  commercially 
available  parts 

4)  The  analysis  of  word  length  or  precision  requirements  for  CPA  elements  and  the 
implication  of  these  results 

5)  The  applications  for  which  advantageous  utilization  of  CPA's  can  be  made  in  Avionics 
signal  processing. 

Prior  to  elaborating  upon  these  tasks  in  the  body  of  this  report,  a brief  perspective  of  how 
these  arrays  differ,  in  nature,  to  other  array  technologies  is  in  order.  Arrays  for  proces- 
sing signals  can  be  categorized  as  either  analog  (Ref.  1,  2,  3,  4,  5),  binary  (Ref.  6,  7),  or 
numerical  (Ref.  8,  9,  10)  depending  on  the  nature  of  the  data  and  the  primitive  or  "Kernel" 
function  which  an  element  of  the  array  possesses.  The  analog  approaches  were  representative 
to  some  of  the  early  work  in  arrays  useful  for  classification,  feature  extraction  and  pattern 
recognition.  Comprised  of  threshold  elements  (basically  analog  multipliers  and  summers), 
they  possessed  the  ability  for  weight  and  threshold  adjustment.  5?uch  arrays  of  elements 
could  be  configured  to  provide  for  a variety  of  signal  processing  functions.  An  analog  ele- 
ment would  provide  an  output  of  one  state  if  the  weighted  sum  of  the  inputs  exceeded  a given 
threshold  or  provide  an  output  of  an  opposite  state  otherwise.  Analog  computer  techniques 
are  representative  of  this  type  of  array. 

The  binary  cellular  arrays  include  such  regularly  structured  systems  as  associative  memor- 
ies and  image  processing  arrays  which  can,  in  effect,  perform  parallel  transformations  on 
binary  Input  patterns. 
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The  numerical  element  based  array  is,  in  effect,  the  subject  of  this  report.  Because  of  its 
compatability  with  digital  computers  and  its  flexibility  the  numerical  array  consists  of  arith- 
metic elements  operating  on  numbers  in  parallel  rather  than  bits  or  analog  levels  and  is  con- 
sidered to  be  the  most  flexible  since  it  can,  in  general  realize  the  functions  of  the  analog  or 
binary.  However,  it  may  not  always  be  as  efficient  as  one  of  the  other  schemes  in  particular 
applications.  For  example,  one  of  its  major  advantages  over  analog  arrays  are  the  precision 
and  dynamic  range  it  can  provide.  In  some  cases  where  one  or  two  layers  or  elements  (par- 
ticularly linear  elements)  suffice  in  providing  a transformation  or  classification,  a numeri- 
cal unit  may  not  be  necessary,  although  it  can  be  "programmed"  to  do  this.  Its  compatibility 
with  digital  computers,  and  its  versatility  are  probably  the  main  factors  for  considering  this 
approach  now. 

This  report  organization  begins  with  the  element  structure,  functions  and  applications,  Sec- 

i I 

tions  II  and  III,  before  going  into  the  hardware  aspects,  llie.  Section  IV,  V,  and  VI  present 
all  major  hardware  and  LSI  design  aspects.  Preliminary  analysis  of  architectural  approaches 
(discussed  in  the  Interim  Report,  Ref.  11),  identified  a key  LSI  circuit  for  development.  This 
cii'cuit  and  its  use  in  a demonstrable  brassboard  are  elaborated  upon.  In  each  case  (Sections 
TI  and  VT)  liberal  use  is  made  of  supporting  appendix  material  (Appendices  A through  H)  for 
more  detailed  and  comprehensive  explanations  and  descriptions.  Appendix  A presents  a 
, history  and  survey  of  the  field  of  polynomial  networks.  Appendices  A,  B,  and  E describe  a 

procedure  for  synthesizing  these  networks  from  data  observations  concerning  the  behavior 
of  a system  or  process  being  modeled.  Appendix  C outlines  the  principles  of  networks  that 
use  cluster  distance-computation  primitives  in  multi-class  classifiers.  Appendices  D,  E, 

I 

and  F describe  the  use  of  the  LSI  multiplier-summer  chip  (while  Appendix  H describes  in 

L . detail  the  actual  fabrication  and  performance  of  the  chip)  for  CPA's  and  summarize  its  po- 

tential applications.  Appendix  G presents  results  of  an  application  of  the  nonlinear  multl- 

, nomial  primitive  to  a signal-classification  problem  arising  in  unattended  ground  sensors 

(used  in  remote  surveillance  systems  and  In  target  activated  munitions) . Sections  VII  and 
vni  give  the  analysis  and  supporting  simulations  involved  in  the  CPA  precision  requirements. 
Section  IX  discusses  how  digital  filtering  can  be  regarded  with  CPA's  with  respect  to  adaptive 
learing  nets  - a new  and  potentially  important  application  area.  The  final  technical  effort  re- 
ported in  Section  X describes  how  pre-processing  functions  usually  required  with  CPA's,  can 
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A basic  elomcnt  is  depicted  in  I'ig  1(a);  the  fipiire  shows  its  niajoi-  control  and  input  and  out 
put  signals.  Such  elements  are,  in  general,  structured  in  multi-layered  nets,  as  shown  in 
Fig.  1(1)).  The  interconnection  control,  which  is  part  of  each  element,  or  which  a|)pears 
functionally  between  successive  layers  of  elements,  has  inputs  which  can  route  the  output 
signals  of  any  one  element  to  the  desirol  input  of  an  element  in  the  next  layer.  I.ach  element 
and  interconnect  switch  in  the  array  has  sufficient  memory  for  storing  the  function  t>n)e  it  is 
to  compute,  the  necessary  weights  or  coefficients  of  its  function,  and  the  interconnect  data. 
Hence,  this  progi'ammable  array  can  1x3  considered  as  a distributer!  logic-memory  proci*ssor 
capable  of  rapid  reconfiguration  and  calculation  of  complex  transformations. 

2.1  MII.TIPI.FXING 

Before  describing  the  element  requirements  and  architecture,  the  three  possible  configura- 
tions of  programmable  arrays  will  be  explained;  these  configurations  are  shown  in  Fig.  2. 

In  (a),  a net  of  dimension  j x k is  shown  fully  populated,  with  each  element  containing  an  Di- 
bit store  for  necessary  function  and  control.  This  confip;uration  can  process  (or  transfoi-m) 
j/2  inputs  at  a time  anti  can  be  operated  in  a pipeline  mode  so  that  k sets  of  data  are  simul- 
taneously operated  upon  in  different  layers. 

In  (b),  a single  layer  of  elements  can  be  multiplexed  to  simulate  a whole  net.  I'ho  memorv 
of  each  element  must  now  have  km  bits  to  provide  the  necessary  control  as  the  processoi-  of 
the  element  acts,  in  turn,  to  realize  each  of  the  k layers.  For  any  single  problem,  the 
speed  is  the  same  as  in  (a),  but  pipelining  cannot  be  done.  In  (c)  a single  processing  element 
with  the  jkm  bits  of  storage  can  be  used  to  process  inputs  within  a layer  ami  then  successive 
layers.  Ib-ovisions  must  also  be  made  for  storage  of  intermediate  results  in  (1))  ami  (c); 
approach  (c)  will  have  1/j  the  speed  of  (b). 

It  is  envisionefl  that  arrays  of  up  to  256  elements  (or  equivalent)  could  provide  the  processing 
capacity  for  a vast  majority  of  applications.  Hence,  the  design,  using  either  the  configuration 
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Figure  1.  Programmable-Function  Array  Structure 


in  Fig.  2(b)  or  (c),  will  have  the  capacity  for  memory  expansion  to  simulate  nets  of  up  to 
250  elements.  These  arrays  will,  in  general,  be  loaded  and  controlled  by  a computer. 

2.2  I.OADING  AND  CONTROTLIMG 

The  loading  and  controlling  of  a parallel,  either  fully  or  partially  populated  array  is  elabor- 
ated upon  in  Fig.  3.  The  host  computer  will  initially  enter  a weight  vector(s)  via  a common 
bus  and  element  addressing.  (Each  element  will  have  an  appropriate  decoder  as  part  of  its 
interface  control  logic.)  The  interconnect  memory,  also  loaded  by  the  host,  controls  the 
inter-layer  switches.  After  "setting  up"  an  array,  input  data  can  be  derived  directly  from 
its  source  or  through  the  host,  depending  on  conditioning  required,  etc. 

A CPA,  then,  can  embody  various  degrees  of  hardware  according  to  the  following  hierarchy 
of  approaches; 

1)  use  of  all  software  in  existing  computer 

2)  software  with  hardware  multiply 

3)  a dedicated  hardware  element  (for  multinomial  implementation)  used  in  a multi- 
plexed mode 

4)  multi-element  array 

5)  multi-arrays 

Increasing  hardware  complexity  results  in  high  thruputs  and  introduces  the  possibility  of 
pipeling  for  yet  higher  effective  processing  rates.  (In  this  case  one  CPA  can  be  performing 
pre-processing  on  a set  of  data  while  a second  CPA  is  transforming  the  previous  pre- 
^ processed  data.) 

The  relative  performance  of  approach  2 and  3 will  be  seen  later  in  the  tradeoff  analysis  with 
respect  to  both  high  speed  and  low  speed  micro-processor  application. 
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Figure  3.  Ixiading  and  Controlling  a Parallel  Array 


2,3  Fl'NC'nONS  OP^  AN  KLKMKNT 

ruble  I li.'jt.s  the  five  ba.sic  oxpre.ssions  realizable  by  the  element.  Although  any  one  of  a 
number  of  kernels  or  primitives  (or  families  thereof)*  could  be  chosen,  in  theory,  to  satis- 
fy the  requirements  for  an  element  useful  in  CPA's,  the  basic  list  shown  was  chosen  for 
the  following  reasons: 

1)  PI:  Nonlinear  Multinomials,  valuable  for  nonlinear  filters,  discriminant  functions, 
estimation  networks,  and  models. 

2)  P2:  Multilinear  Multinomials,  used  in  multilinear  and  nonlinear  filters,  discrimi- 
nant functions,  estimation  netowrks,  and  models.  Providing  PI  automatically  pro- 
vides the  same  computational  power  required  for  the  simultaneous  calculation  of 
two  P2  expressions. 

3)  P3,  P4:  Linear  Multinomials,  used  in  FFT's  transversal  filters,  linear  discrimi- 
nant functions,  etc. 

4)  P5:  Component  of  Normalized  Distance  Measurement  or  Recursive  Oivision,  used 
to  determine  membership  or  non-membership  of  a data  word  in  a clu.ster  of  prior 
measurements  belonging  to  a given  class. 

Characteristics  of  the  above  family  of  primitive  functions  include: 

• the  interchangeability  of  the  coefficients  and  variables 

• a variable  number  of  multiply  operations;  3 for  P5,  4 for  P3  and  P4,  8 for  two  P2's 
anti  8 for  PI . (While  the  terms  of  PI  could  have  been  grouped  to  require  only 

5 multiply  operations,  the  element  could  then  only  handle  a single  P2.  Speeds 
would  have  been  about  the  same  since  two  levels  of  multiplication  would  have  still 
been  required). 


While  any  single  function  would  suffice  for  an  element,  choosing  from  amongst  a family 
usually  imparts  a higher  spped  for  a given  element  in  a given  array  position. 


TABI.E  1.  CPA  FI  NCTIONS 
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APIM.ICA  riONS  OF  CPA's 

:i.l  AVIONICS  COMPCTATION  HFQriKFMFNTS 

— ) 

Avionics  computation  requirements  arise  in  two  major  areas  (Reference  12):  (a)  general 
purpose  computing  tasks,  tyincally  characterized  by  I/O  sampling  rates  below  5oo  IIz  and  a 
variety  of  logic,  control,  and  arithmetic  operations  performed  at  throughput  rates  below 
500  K - Ops;  and  (b)  signal  processing  tasks,  typically  having  I/O  rates  in  excess  of  1 MHz, 

"a  relatively  narrow  regular  flow  of  arithmetic  operations,  " ami  high  computational  speeds 
in  the  10-100  MHz  range.  As  stressed  in  the  "eference,  on-going  developments  in  l.Sl 
arrays  for  signal  processing  may  produce  ecoiximic  ami  performance  benefits  comparable 
to  those  demonstratefi  in  general  purpose  computation  by  l.Sl  micorprocessor  and  microcon- 
troller elements. 

For  airborne  signal  processing  applications,  the  reference  cites  requirements  in  the  follow- 
ing areas: 

Radar  — air-to-air  and  air-to-groumi  mo<les,  including  synthetic  aperture  ground 
mapping. 

Klectronic  Warfare  — signal  sorting  and  classification. 

Communications  — image/waveform  coding  and  decoding. 

The  algorithms  of  particular  importance  in  airborne  signal  processing  are,  in  accordance 
with  the  reference: 

• Digital  Fourier  transforms  and  inverse  transforms  up  to  2,  048  points 

• Digital  filters  — recursive  ami  nonrecursive 

• Weighting  functions  — cosine-squared,  Taylor,  Hamming,  etc. 

• Correlations  — serial  anrl  parallel,  with  various  levels  and  combinations  of  source 
and  reference  signals 

• Walsh  functions,  Hadamard  transforms,  and  related  waveform/image  coding  trans- 
formations 
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• Adaptive  pretlictive  ccxling  anti  other  bandwidth  compression  techniques  for  data 

• Nested  (xilvTiomial  functions 

• Ix)ok-up  tables 

• integration,  averaging,  anti  stan(iar<i  deviation 

• Coordinate  conversion. 

The  following  algorithms  are  added  by  the  authors  to  the  above  signal  processing  functions: 

• Recursive  means  and  covariance  matrices  of  vector  waveforms 

• Convolutions  and  deconvolutions 

• Data  cluster  screening 

• Complex  Boolean  arithmetic 

It  is  believed  that  CPA's  are  best  suited  to  signal  processing  applications  because  of  the 
functions  implemented  arxi  the  extremely  high  processing  speed  of  distributed-function  net- 
works of  CPA  devices.  Accordingly,  attention  in  this  report  will  be  directed  primarily  toward 
the  signal  processing  area. 

3.2  FL'LnU.MENT  OF  AVIONICS  SIGNAL  PROCESSING  RKCjl’lPKMKNTS  WITH  CPA’s 

The  Avionics  signal  processing  requirements  enumerated  in  3. 1 can  all  be  fulfilled  using  the 
PI,  P2,  P3,  P4,  and/or  P5  primitive  functions  defined  in  2.3.  Table  2,  Parts  1-3,  pre- 
sents the  details  of  this . 

3.3  RELATIONSHIP  TO  ADAPTIVK  LKARNING  N KTWORKS  (AI.N's) 

As  one  of  the  main  factors  influencing  the  development  of  CPA's,  th  ' ideas  Ixjhind  their  use 
in  Adaptive  Learning  Nets  (AIJs's)  besides  know-n  transformations  will  be  briefly  discussed. 
Fig.  4 is  a flow  chart  showing  how  an  input  data  set  is  used  to  train  a«l  evaluate  a net  in  the 
process  of  "fitting  a surface  or  determining  a transfornwition"  for  discrimination,  classifica- 
tion, etc. 

In  this  process,  the  input  data  is  initially  sub-divide<l  ii  t two  independent  main  groups  - a 
fitting  and  selection  subset  anti  an  evaluation  subset,  iliese  groups  contain  some  parameters 
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TABLE  2,  PART  1 - ALGORITHMS  FOR  AVTOKICS  SIGNAL  PROCESSING 

NUMBER  OF 
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INPUT  DATA 


Figure  4.  Adaptive  learning  Method 


which  are  used  directly  as  inputs  to  a C'i’A.  More  generally,  features  are  extracted  from 
the  input  data,  providing  other  frequently  more  useful  inputs  to  the  CPA.  A combination  of 
these  inputs  is  then  used  in  conjunction  with  the  techniques  listed,  in  the  synthesis  of  a CPA 
network,  i.e.,  optimization  of  coefficients,  element  l)y  element  by  pairing  inputs,  followed 
by  multi-layer  synthesis,  etc.  ITiis  is  descrited  in  more  detail  In  Kef.  17,  and  in  Appendix 

Kvaluation  of  the  CP.\  with  the  Independent  testing  subset  is  then  used  to  determine  if 
re-adjustment  of  the  coefficients  and  connectivity  is  in  fact  required.  Finally,  the  net  is 
placed  "on  line"  for  intended  function.  Changes  in  the  environment  trigger  a re-synthesis 
routine,  to  maintain  a "fit"  for  the  process. 

■Such  methodology,  lApiiendices  A and  F),  has  effectively  been  used  in  th-'  following  aero- 
space applications: 

1.  nondestructive  lnsi)ection  of  structural  parts 

2.  tr.ajectory  predictions 

.3.  target  signature  classifications 

4.  radar  refractive  index  corrections 

5.  detection  of  remote  nuclear  events 

().  voice  data  processing 

7.  reconnaissance  image  processing 

8.  electronic  warfare 

9.  avionics  information  systems 

f )nce  a net  has  been  trained  it  can  be  used  in  conjunction  with  known  or  other  synthesized  net 
configurations  in  the  solution  of  a given  problem.  This  is  illustrated  In  Fig.  5,  where  a 
mlcro-prtxjessor  Is  used  as  an  overall  system  controller  with  a CPA.  In  this  example,  the 
CPA  is  depicted  as  performing  two  known  transformations  on  the  Input  data  samples  (such 
as  FF'r  and  cepstrum;  CPA  1,  2)  in  order  to  extract  features,  and  then  three  other  (generally 
non-linear  CPA  .3,  4,  5)  transformations  found  by  a pre-training  routine.  In  such  a classifi- 
cation .system,  the  micro  controls  the  Input  samples  to  the  CPA's  and  loads  or  Instructs  the 


Fig.  5.  C PA  Application  to  Classification  Systems 


► 


0 


I use  of  the  appropriate  coefficient  or  weight  vectors.  Ihe  output(s)  of  any  <1’A  niav  l)e 

further  processed  1)V  the  micro,  prior  to  entry  into  the  next  CPA.  Hie  pre-i>rf)Cessint;  CPV 
outputs  feed  the  micro  for  the  overall  feature  extraction  i)hase.  ( The  micro  misht  lx‘  |K;r- 
forminn  statistical  analysis  or  other  routine  data  reduction  for  ()ptimum  utili/ation  of  the 
CP.A's).  Now  the  C P.\  is  ready  to  act  in  its  rtde  of  ;i  discriminator  and  can  l)e  loaded  in 
turn  with  the  coefficients  and  interconnectivity  pre-stored  either  in  a micro  l«  i.M  or  in  the 
C P.\  memory  itself.  The  discrimination  out[)uts  are  ;igain  fed  to  the  micro  for  relati\eh 
sim[)le  decision  lojjie  to  form  the  basic  classification. 
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BASIC  ABCHm-:C'Il'HAL  APPHOACHKS  FOB  CPA  nUPI.F:MKNrA'nONS 


1.  1 C.KNK  HAl. 

In  order  to  achieve  a high  decree  of  applicability  of  the  CPA  technology,  a fast,  simple  ele 
ment  with  the  capability  of  being  stacked,  in  building  block  fashion,  to  realize  conifilex  ar- 
rays is  necessary.  A self-contained  element  on  a chip,  with  the  speed  and  precision  neces- 
sary for  many  practical  applications  is  not  yet  possible.  Kven  recent  micro-processor 
advances,  (especially  where  the  micro's  architecture  was  designed  for  signal  processing), 
.although  approaching  such  a goal,  require  substantial  complexity  especially  when  used  in 
the  high  precision  areas.  'Hie  basic  alternatives  will  be  discussed  in  relation  to  the  tech- 
nology required  to  effectively  support  each  architecture.  First,  each  alternative  must  tw 
viewed  with  respect  to  the  precision  with  which  the  element's  multinomial  is  calculated. 

.Since  a wide  range  of  application  was  considered,  precisions  ranging  from  8 to  48-bits, 
and,  the  use  of  floating  point  as  well  as  fixed  point  was  considered.  Hence,  the  performance 
factors  for  all  approaches  are  estimated  as  a function  of  bit  precision.  ITiere  are  basiciJly 
two  ways  to  build  an  element  (aside  from  a micro-processor  based  architecture);  one  cen- 
tered around  the  use  of  a serial -parallel  type  multlpller(s)  .and  the  other  centered  around 
the  use  of  all-parallel  type  multipliers. 

The  basic  distinctions  are  as  follows: 

•Serial -parallel  multiplier  - has  provision  for  pre-loading  and  storing  an  n bit  multiplier 
under  a clock  control.  'ITie  product  appears  serially,  least  significant  bit  first,  and  the 
most  significant  bit  after  2n  clock  cycles.  Besides  the  n bit  multiplier  register,  the  chip 
contains  n full-adder  and  latch  stages,  as  well  as  some  auxiliary  control  logic. 

Parallel  multiplier  - accepts  the  n-bit  multiplier  and  n-blt  multiplicand  words  in  parallel, 
and  produces  the  2n  bit  product  in  parallel  as  a result  of  asynchronou.sly  rippling  through 
an  array  of  rfi  full -adders. 
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V^  hUe  the  all-par;illel  multiplier  Is  much  faster  In  performing  a multiplication,  It  Involves 
substantially  more  complex  chlp(s).  The  serial -parallel  multiplier  approaches  equivalent 
throughputs  for  the  polynomial  element  by  using  several  simpler  chips.  It  also  permits  the 
use  of  serial  adders  rather  than  high  speed  parallel  adders,  and  leads  to  simplified  Inter- 
connect problems. 

The  multiplier  technology  has  progressed  very  r.apldly  In  the  past  three  to  four  years  for 
both  types  of  multiplier  Implementations.  The  unit  developed  for  use  In  the  CPA  (Ref.  18 
and  Included  In  Appendix  H)  was  followed  by  a comparative  analysis  of  multipliers  (Ref.  19) 
and  developments  of  small  bit  capacity,  bipolar  serial -parallel  multipliers  (Ref.  20,  21). 
The  latter,  now  commercially  available  Is  considered  in  the  tradeoff  analysis  of  alternate 
architectural  approaches  to  follow.  In  the  parallel  multiplier  realm,  two  large  capacity 
I.SI  versions  were  considered:  an  8 X 8 SOS  (Ref.  22)  and  a 10  X 10  Bipolar  (Ref.  23). 
Smaller  (4X4  arrays  or  less  such  as  In  Ref.  24)  higher  speed  units  were  not  considered 
because  of  substantially  greater  overall  complexity  of  the  resultant  element  and  because  of 
availability  and  cost.  Such  units,  characterized  by  multiplication  times  of  less  than  40  ns 
for  8X8  calculations,  would  ultimately  Impact  the  design  of  very  high  speed  elements,  re- 
quiring different  architectures,  and  using  predominantly  ECL  parts  and  bl-polar  memories. 

4.2  CUSTOM  CMOS/SOS  SERIAL-PARALLEL  MULTIPLIER 

The  characteristics  of  the  multiplier  chosen  for  use  In  the  mantissa  processor  of  an  element 
Is  summarized  here.  A paper  with  the  complete  description  of  Its  design  and  operation  Is 
Included  In  Appendix  H. 

The  chosen  design  fits  the  basic  requirements  of  an  efficient  element.  Ihls  multiplier  and 
Its  use  within  the  element  has  furthermore  shown  a generally  optimum  approach  as  will  be 
seen  In  the  tradeoff  analysis,  particularly  for  high  precision  calculations  (even  though  alter- 
nate circuits  and/or  architectures  excell  for  certain  criteria). 

Table  3 summarizes  the  chip  data.  It  should  be  pointed  out  that  the  top  speed  - 15  MUz  - 
falls  somewhat  short  of  predicted  simulation  (Ref.  11)  values  because  of  the  extra  delays  in 
the  hill  adder.  (Ihls  design  was  in  fact  partly  responsible  for  the  excellent  packing  density 
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TAIU.K  3 


Cl’S  roM  t lOS/SOS  Ml  I.TIPUKH  PKRFOKMANCF.  SUMMARY 

PFRFORMANCF: 

FUNCTION:  a.x  + b 

a,  X,  b are  up  to  24  bits  In  length.  Taps  provided  at  8,  IG  bits. 

C'ascadable  for  greater  accuracy. 

CHIP  Sr/F:  155  mil  X 170  mil 

NUMBFR  OF  DFMCFS:  1750 

PACKAGF:  Ifi-pln;  (3,  8-blt  registers  for  serial /parallel  loading  of 
multiplier;  multlpllc?i  <i  and  addend  loaded  serially) 

TYPICAL  POW  FR  DISSIPA  HON:  5 mw,  5v,  5 MHz  clock 

80  mw,  lOv,  10  MHz  clock 
300  mw,  15v,  15  MHz  clock 

FNFRGY  CONSUMPTION  (FOR  MULTIPLYING  TWO  16-BIT  NUMBERS):  64nJ 

on  the  chip. ) A second  generation  circuit  would  not  only  result  In  an  Improved  speed  (we 
would  consider  this  a worthwhile  trade  with  respect  to  chip  area),  but  would  Incorporate 
some  extra  control  logic  to  minimize  external  parts  count  In  the  mantissa  processor. 

This  multiplier  has  also  resulted  In  efficient  Implementations  of  two  other  types  of  signal 
processing  subsystem  besides  the  CPA  element.  These  include  two  FFT  processors  (Ref. 
25,  26)  and  a (Quadrature  Demodulator-Digital  Filter  (Ref.  27). 

Descriptions  of  the  basic  architectural  alternatives  will  now  be  given.  One  based  on  the  use 
of  the  serial -parallel  multiplier  (word  parallel,  bit  serial  processor)  and  two  based  on  the 
use  of  the  all  parallel  multiplier  (word  serial,  bit  parallel  processors).  In  the  latter 
case  we  consider  a micro-processor  controller  within  the  element  as  well  as  the  hard-wired 
case.  The  comparative  performance  of  all  alternatives  versus  precision  Is  given  In  Section 
VI. 
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1.3  woiu)  PAi{.\i.i.i:i.-ni  r skklm,  apphoach 


I'he  heart  of  the  iKilynomlol  element  - the  arithmetic  unit  or  mantissa  processor  - using  the 
custom  chips  is  shown  in  Fig.  (>.  With  each  cell  Ijelng  one  of  the  custom  chips,  cell  1 func- 
tions as  a delay  register,  cells  2 through  9 generate  the  product  terms  (the  processor  shown 
is  configured  to  Implement  PI),  while  cell  10  serves  as  register. 

In  this  approach,  generation  of  different  polynomials  Involves  reconfiguring  the  cell  inter- 
connections. lliis  actually  simplifies  the  control  circuitry,  as  the  controller  must  only 
provide  control  signals  to  the  interconnection  gates  (not  shown  in  Fig.  0)  and  output  a single 
clock  hurst.  This  will  be  seen  in  mure  detail  in  Section  V when  the  brassboard  is  descrlted. 

The  word-parallel  processor  modified  for  floating-pxjint  calculations  is  shown  in  Fig.  7.  As 
in  the  fixed-point  processor,  the  polynomials  are  generated  by  proper  interconnection  of 
cells.  I he  function  of  the  cells  In  the  mantissa  section  is  Identical  to  that  of  the  correspond- 
ing cells  in  Fig.  fi. 

An  overflow  |x)sItlon  detector  and  register,  scalers,  a left-justlfler,  and  an  exponent  proc- 
essor are  provided  to  perform  the  necessary  floating-point  functions. 

Since  the  polynomial  terms  are  added  together  in  parallel,  scaling  of  these  terms  must  be 
done  in  parallel.  'I’he  scaler  information  in  this  case  represents  the  difference  between  the 
largest  of  the  term  exponents  and  the  exponent  for  the  given  term.  Since  this  number  is 
always  non-negative,  the  scalers  although  there  are  six  of  them,  are  relatively  simple  de- 
vices, and  consists  of  pre-settable  down  counters.  IXiring  the  scaling  cycle,  the  counters 
are  counted  down  to  zero  and  a clock  is  enabled  to  the  various  cells  as  long  as  that  cell's 
sc:der  is  non-zero.  When  any  scaler  reaches  zero  it  turns  off  its  clock.  'I’he  final  addi- 
tion is  then  performed  on  all  of  the  individual  terms. 

After  the  final  addition  the  mantissa  must  be  left  Justified.  This  is  accomplished  as  follows. 
The  entire  mantissa  is  shifted  into  a serial  register  and  the  present  incoming  bit  is  com- 
pared with  the  bit  received  previously.  If  both  bits  are  the  same,  a counter  is  advanced  by 
one,  and  if  they  differ,  the  counter  is  reset  to  zero.  The  mantissa  plus  5 additional  bits, 
(allowance  for  maximum  overflow),  are  shifted  into  the  register  and  counter  circuit.  At 
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Pig.  7.  Word- Parallel  Floating-Point  Processor 


the  end  of  this  timo  the  nuniln'r  in  the  counter  indicates  the  position  of  the  MSB  in  the 
mantissa.  Phis  numlxir  is  then  transferred  to  the  exponent  circuits  so  that  it  can  calculate 
the  final  exponent  term  and  is  also  used  to  left  or  rifjht  justify  the  mantissa. 

Phe  exponent  section  consists  of  a bit-parallel  adder /suittractor  (required  so  that  ;ill  term 
exiK)iients  are  availaldc  l)efore  the  mantissas  arc  added  together),  two  memories,  and  con- 
trol logic.  Phe  control  logic  implements  the  flowchart  of  Fig.  8. 

Phe  use  of  the  ones  complementor  along  with  the  truncation  of  the  least  significant  bits  will 
cause  a processor  error  of  less  than  one  mantissa  I.SB  for  either  the  bit/word-serial  or  the 
word-parallel  floating  point  [)rocessors. 

1.  1 WOBD  .SKHLM.-BrP  PAHAl.LFL  APPHOACll 
i.  1.  1 BlT-P.MtAI.LF.l.  MFLTIPLIKH  APPHOACll 

Phe  architecture  of  a bit-parallel  arithmetic  processor  is  shown  in  Fig.  9.  I’or  fixed-point 
calculations,  a bit-parallel  multiplier  and  a bit-parallel  accumulator  form  the  heart  of  the 
processor.  Gating  is  provided  to  allow  calculations  of  second-  and  third-order  product 
terms. 

Latches  at  the  multiplier  and  adder  outputs  provide  synchronous  oiieration  of  the  processor. 
A typic.'il  micro-sequence  of  operations  for  a fi  term  iX)lynomi;d  would  be  as  shown  in  Table 
4.  Individual  sequences  for  the  various  polynomial  types  would  be  stored  in  program  mem- 
ory and  called  up  when  required.  Fxecution  time  would  te  basically  dependent  upon  the 
cycle  time  of  the  multiplier  and  will  be  considered  in  the  tradeoff  analysis. 

For  floating-point  calculations,  two  parallel  scalers,  an  overflow  detector,  and  an  exponent 
processor  must  lie  added. 

The  scfUers  align  the  mantissas  of  the  two  floating  jxiint  numbers  to  be  added  so  that  bits  of 
equal  significance  are  added  together. 

Phe  overflow  detector  determines  the  position  of  the  most-significant  bit  (MSB)  in  the  out- 
fwt  mantissa,  so  that  this  mantissa  may  be  left-justified  Ijefore  storage  in  the  data  memory. 
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Fig.  8.  Kxponent  Processor  Flowchart 
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REG. 


I/Oft -justification  of  the  output  mantissa  preserves  the  accuracy  of  the  element,  since  the 
maximum  number  of  siRniflcant  bits  are  stored  in  the  memory. 

The  exponent  processor  determines  the  output  exponent,  provides  lnff)rmatlon  for  mantissa 
scaling,  and  adjusts  the  output  exponent  for  left -justification  of  the  mantissa. 

The  exjx)nent  processor  for  floating  point  computation  consists  of  a bit-parallel  adder/ 
subtractor,  four  blt-par:»llel  memories,  and  control  logic. 

ITie  ej  store  retains  the  exponent  value  for  the  current  (i-th)  term  of  the  polynomial,  while 
the  eg  store  retains  the  exponent  for  the  number  already  in  the  accumulator.  The  processor 
loads  the  scaler  registers  with  the  necessary  shift  information  so  that  each  new  term  may 
be  properly  added  to  the  accumulator  contents.  The  processor  keeps  track  of  the  accumula- 
tor exponent.  When  all  terms  have  been  accumulated,  the  eg  exponent  is  adjusted  for  the 
mantissa  left-justification  and  outputted  to  the  X~Y  memory.  This  approach  was  elaborated 
upon  in  Kef.  11. 

4.4.2  MICKO-PRtXTESSOR  BASED  SYSTEM 

An  alternate  implementation  of  the  bit  parallel  processor  would  be  based  on  a micro- proces- 
sor as  the  heart  of  an  element.  A separate  hardware  multiply /accumulate  unit  would  be  in- 
cluded within  the  I/O  structure  of  the  micro-processor  to  reduce  the  time  required  for  the 
arithmetic  computations. 

The  architecture  of  such  a system  is  shown  in  Fig.  10.  The  micro  sequences  required  to 
implement  any  of  the  polynomials  would  be  stored  in  the  program  memory  and  the  processor 
could  handle  all  of  the  required  housekeeping  functions  such  as  address  indexing,  subrou- 
tines, Jumps,  etc.  The  data  switching  and  multiplexing,  which  is  handled  with  discrete 
parts  in  the  bit  parallel  approach,  would  be  controlled  within  the  micro.  In  order  to  main- 
tain a balance  between  complexity  and  speed  the  system  would  be  limited  to  IG  bit  precision. 

A great  variation  in  system  speed,  complexity,  power,  cost,  etc.  exists  and  the  exact  fac- 
tors will  depend  upon  the  type  of  micro  employed  in  the  system  as  elaborated  upon  in  Section 
VI.  The  processor  could  vary  from  one  of  the  bit  slice  processor  to  one  of  the  many  8 bit 
processors. 
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Fig.  iO.  Micro- I'rfxjessoi  Based  Klement 

The  bit  slice  processors  offer  higher  throughijut  (typical  cycle  time  100-300  ns)  at  the  ex- 
pense of  system  complexity.  Most  bit  slice  processors  are  mlcroprogrammable  and  re- 
quire a 16  bit  wide  control  memorv.  Some  of  the  bit  slice  processors  that  could  be  empbwed 
In  the  system  are:  the  AMD  2900  series,  4 bit  bipolar  slice;  the  MMl  5701  series,  4 bit  bi- 
polar slice  or  KCA's  ATMAC,  8 bit  SOS  array. 

At  the  other  end  of  the  spectrum  are  the  low  cost  8 bit  prcxjessors  which  have  a cycle  time 
In  the  order  of  2 - 5/iS.  These  devices  have  a fixed  Instruction  set  and  are  generally  slm- 
pier  to  Implement.  In  order  to  obtain  a useful  system,  double  precision  arithmetic  opera- 
tions would  have  to  be  used.  Some  typical  devices  are:  the  8080  series,  6800  series  both 
NMOS  devices  or  the  1802  series  which  is  a CMOS  device. 

4.5  CPA  CONTROL  SOF  l'V\  AHF 

'ITils  section  describes  the  prf>ces3or  (mlnlcomixiter  or  microprocessor)  Interface  struc- 
ture(3)  for  control  of  a CPA.  The  com(>uter  must  provide*  the  array  with  sufficient  Informa- 
tion (In  the  form  of  control  Instructions  and  data)  to  solve  the  selected  problen  . Control 
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instruction  and  data  must  l>c  loaded  Into  their  res|HJctive  memories  in  the  j)rofxir  format 
since  much  time  can  lx*  saved  in  the  actual  proccissln^  of  the  data  if  it  emerges  from  the 
memory  in  a set  manner. 


I’hree  different  functions  will  be  descrited: 

1.  Array  command  word  structure 

2.  Klement  control  word  structure 

3.  Data  memory  organizidlon 

Heference  is  made  to  the  functional  diagram  shown  In  Fig.  11.  The  description  of  the  soft- 
ware Interface  will  Ije  illustrated  by  a practical  example  involving  an  array  of  five  elements 
to  be  programmed  for  a network  configuration  as  shown  in  Fig.  12. 

Figure  13  summarizes  the  various  types  of  word  structures  and  format  presented  in  the 
fc'Ilowlng  discussion. 

4.  5. 1 AIDIAY  COMMAND  VVOHD  STHUCTUHF 

This  set  of  words  is  used  by  the  controlling  minicomputer  to  instruct  the  array  about  what 
sort  of  data  is  to  be  given  to  or  taken  from  its  memory,  and  when  to  start,  interrupt  and 
continue  array  execution.  A separate  16  line  bus  is  provided  for  these  commands. 

There  are  three  basic  types  of  commands:  (load/retrieve  data,  execute,  interrupt).  The 
first  is  broken  down  into  three  subgroups: 

1)  Data  transfer,  single. 

This  is  a single  word  command  to  either  load  or  retrieve  a single  data  word  from 
the  address  specified  in  the  data  or  control  memory.  Bits  0 to  9 refer  to  the  ad- 
dress, bits  10  and  11  are  zero  to  indicate  a single  transfer  without  address  exten- 
sions, bit  12  identifies  either  the  control  or  data  memory  and  bits  13,  14  and  15 
form  a three  bit  operation  code  (2jj  for  load,  12y  for  retrieve*) 


♦ Ihe  convention  used  here  to  read  octal  digits  starting  from  the  0 bit  position  (see  top 
of  Figure  13). 
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Fig.  11.  System  Functional  Diagram 


2)  Data  transfer,  nuilti|)le. 


» 


I 
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This  coniniand  uses  two  words:  the  first  is  the  same  as  above  exceyit  Lliat  bit  11  is 
set  to  a ”1"  and  indicates  that  more  than  one  transfer  is  to  take  place;  tiie  second 
word  Indicates  the  number  of  transfers. 

3)  Kxtended  memory. 

This  command  is  used  when  there  is  more  memory  in  the  array  than  can  be 
addressed  by  10  bits  (1021)  and  uses  two  or  three  words.  ITie  first  word  is 
functionally  the  same  as  the  single  word  in  1)  above,  except  that  hit  10  is  set 
to  "1"  and  bits  0-9  are  ignored.  If  bit  11  is  a "0"  (two-word  command),  the 
second  word  represents  the  extended  address  to  which  the  data  will  be  loaded. 

If  bit  11  Is  a ''1"  (three-word  command),  the  second  word  represents  the  num- 
ber of  transfers  to  be  made,  and  the  third  represents  the  extended  address  (up 
to  10  bits). 

The  execute  command  word  is  used  to  initiate  execution  at  a specific  location  in  the  control 
memory.  Here  bits  13-15  are  set  to  "1"  (lOg)  and  bits  12-0  represent  the  address  at  which 
to  start  execution.  Up  to  13  bits  are  available  for  the  address  so  no  provisions  will  be  made 
for  an  extended  address  form  of  this  command. 

Finally,  the  Interrupt  command  word  is  used  when  for  some  reason  it  is  desired  to  stop 
execution  at  some  place  other  than  a programmed  halt.  Provisions  are  also  made  for  con- 
tinuing execution  from  the  interruption  point  should  the  need  arise.  The  use  of  this  com- 
mand word  is  anticipated  to  be  very  effective  in  debugging  operations.  Bits  15-13  again 
form  the  operation  code  (04g  for  interrupt  and  14g  for  continue)  with  bits  12-0  being  unused 
at  present,  but  are  available  for  future  enhancements. 


’Hie  complete  set  of  command  word  op-codes  is  given  below: 


COMMAND  WORD  OP-COD KS 
CODE 


FUNCnON 

BITS  15-13 

OCTAL 

Spare 

000 

00 

Load 

001 

02 

Interrupt 

010 

04 
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Spare 

on 

06 

Spare 

100 

10 

Retrieve 

101 

12 

Continue 

110 

14 

Execute 

111 

16 

1.5.2  KLKMENT  CONTROL  WORD  STRUCTURE 

To  provide  the  array  with  a means  of  knowing  what  to  do  with  all  the  data  stored  In  Its  data 
memory,  control  words  are  provided  In  the  control  memory.  These  control  words  are  U! 
blt-s  long  and  consist  of  a three-bit  operation  code  (again  bits  13-15)  and  an  address  of  10 
to  13  bits.  The  operation  code  specifies  which  of  five  polynomials  Is  to  be  computed  (and 
some  other  Instructions),  while  the  address  tells  the  processor  the  location  of  the  first 
value  In  the  data  memory.  The  various  codes  are  summarized  below; 

ELEMENT  CONTROL  OP-CODES 
CODE 


FUNCTION 

BITS  15 

-13 

OCTAL 

NOP 

0 

0 

0 

00 

PI 

0 

0 

1 

02 

P2 

0 

1 

0 

04 

P3 

0 

1 

1 

06 

P4 

1 

0 

0 

10 

P5 

1 

0 

1 

12 

YDEST 

1 

1 

0 

14 

HALT 

1 

1 

1 

16 

The  HALT  code  (16^^)  tells  the  processor  to  finish  off  what  It  was  doing  and  stop  processing 
data.  ITils  Instruction  is  the  normal  way  in  which  a series  of  computations  would  end.  The 
code  NOP  (OOf^)  Is  used  in  special  cases  when  computations  overlap  and  It  Is  desired  to  use 
the  results  of  one  in  the  very  next  computation.  This  occurs  during  the  division  algorithm 
where  a series  of  P5  polynomials  are  calculated  and  each  successive  computation  uses  the 
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results  of  the  previous  one.  Finally,  the  YDKST  code  specifies  where  in  the  data  memory 
the  result  of  the  previous  computation  is  to  be  stored.  Multiple  destinations  are  indicated 
by  more  than  one  YOKST  appearing  after  a polynomial  specification.  Shown  below  is  an 
illustration  of  a short  program  corresponding  to  the  network  shown  in  Fig.  12. 


A'nON 

CONTENTS 

COMMENT 

0000 

0400001 

P2  with  data  starting  at  Ij^ 

0001 

1400011 

Put  result  at  location  11^ 

0002 

1400020 

Also  put  result  at  location  20 

0003 

0400007 

P2  with  data  starting  at  7^ 

0004 

1400027 

Put  result  at  location  27y 

0005 

0200012 

PI  with  data  starting  at  12g 

0006 

1400024 

Put  result  at  location  24jj 

0007 

1010022 

P5  with  data  starting  at  22g 

0010 

1400026 

Put  result  at  location  26^ 

0011 

1010025 

P5  with  data  starting  at  25g 

0012 

1400030 

Put  result  at  location  30g 

0013 

1600000 

Halt 

4.5.3  DATA  MEMORY  ORGANIZATION 

liie  organization  of  the  data  memory  Is  rigidly  structured  in  order  to  facilitate  the  task  of 
the  controller  when  accessing  data  for  a particular  computation.  The  polynomial  weights 
are  loaded  in  reverse  order,  (In  ascending  address  corresponding  to  descending  weight  in- 
dex) as  are  the  X values.  Table  5 shows  how  the  weights  and  X values  are  loaded.  Up  to 
eleven  locations  are  required  (for  P5)  to  hold  the  data  for  each  computation.  Since  the 
controller  is  capable  of  depositing  the  result  of  any  calculation  Into  any  address  in  the  data 
memory,  dummy  values  or  zeros  must  be  loaded  if  a particular  value  is  not  available  until 
after  computation  has  begun.  Note  that  the  control  words  above  instruct  the  controller  as 

to  which  polynomial  to  compute  and  which  data  values  to  expect  and  In  which  order. 
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rATU.K  5.  DM  A MKMOHY  OHC.ANIZ ATION 


CON  TFNTS 

USI- 

1)  IN 

POLYNOMLXFS 

PI 

5 

''■4 

P4, 

PI 

\v 

3 

IM, 

P2, 

PI 

'"2 

P4, 

P3, 

P2,  PI 

IM, 

P3, 

P2,  PI 

^0 

P5, 

P4, 

P3,  P2,  PI 

P4 

^3 

P4 

^2 

P5, 

P4, 

P3,  P2,  PI 

^1 

P5, 

P4, 

P3,  P2,  PI 

A graphic  representation  of  the  array's  memory  content  for  the  specific  example  considered 
Is  shown  In  Fig.  14.  Note  that  the  Instructions  loaded  Into  the  control  memory  are  taken 
from  the  previous  program,  while  the  data  memory  contains  the  weight  and  variable  Inputs 
for  each  of  the  polynomials  to  te  computed.  These  are  each  32-blt  words  (1  bit  sign,  23  bit 
mantissa,  7 bit  exponent,  1 bit  exponent  sign)  and  correspond  to  the  same  format  as  used 
In  FOKTHAN  and  BASIC  langu.'iges.  'ITie  values  In  parenthesis  represent  the  4 ln[xtts  to  the 
simulated  network  while  the  result  (Y  out)  Is  fln.illy  stored  In  location  30. 

In  conclusion,  Ui  program  the  array  for  the  computation  of  *’ie  polynomlnal  network  shown 
In  Fig.  12  and  then  Instruct  the  array  to  execute  such  computation,  the  comjxiter  will  give 
the  arr.ay  controller  the  following  comm.and  sequence: 

1)  0 '01 1/100/000/000/001  = 034001  lx)ad  Control  Memory  start  at  1 

2)  0/000/000/000/001/100  = 00014  with  14„  words 

3)  0/010/100/000/000/001  = 024001  Ixiad  Data  Memory  Start  at  1 

4)  0/000/000/000/011/000  = 000030  With  30  „ words 

5)  1/100/000/000/000/001  = 100001  Fxecute  Starting  at  1 

0)  1/010/000/000/011/000  = 120030  Hetrleve  1 Word  from  IxxJatlon  30 
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CONTROL  MEMORY  DATA  MEMORY 


Nolo  that  l)ct\voen  conimantis  2 aiul  3,  and  \ and  h the  actual  data  transfer  takes  ))lace  over 
the  data  l>us. 
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ISHASSllOAIil)  1-  I FMI  N r 

I hi.s  .''ec'lion  'l(‘.srribu«,  fii'st  a.s  an  overview,  and  tln  n,  in  mori'  dnlail  the  desinn,  construc- 
Uiin  and  te.st  ul  ilu'  lirassboartl  tdemeni.  1'hi.s  brasslioanl  was  Ija.sed  on  the  serial  type 
oruanization  with  32-bit  lloatin^;  |xdnt  preeision  and  was  provided  with  sutlieient  features 
to  demonstrate  the  overall  element  approach,  usins  a group  of  the  custom  serial-parallel 
multipliers,  as  well  as  "calibrate"  the  performance  of  this  architecture  against  alternate 
approaches  in  the  tradeoff  analysis.  While  the  heart  of  the  elenumt,  the  21-bit  mantissa 
processor,  ineo rtxirating  internal  suitehing  foi'  function  control,  did  not  present  undo 
problems,  the  ex[x)nent  processor  and  asst)Ciatecl  scalers  for  justification  was  found  to 
re()uire  a considerable  amount  of  circuit  packages  based  on  existing  available  |)arts. 

Much  of  the  other  complexity  was  due  to  the  overall  timing  and  control  which  used  I TI. 
parts  for  flexibility  in  the  initial  design.  .Any  future  model  would  use  I’HOM's  for 
considerable  savings  in  parts. 

I'he  actual  brassboard  is  shown  in  !■  ig.  15,  with  the  mantissa  processor  card  shown  out 
of  the  rack.  The  eight  21-bit  C'MOS/SOS  (ax+b)  chips  can  be  seen. 

5.  1 OVldtVIlAV 

I he  CPA  brassboard  dilock  diagram  shown  in  Fig.  lb)  has  the  capability  to  evaluate 
either  a single  b term  polynomial  or  2,  l-term  polynomial  expressions  with  32  bit  floating 
)H)int  precision  (24  bit  mantissa,  H bit  exiM)nent)  and  was  designeti  to  interface  with  a 
MP21.M.X  mini-computer.  Heside  having  the  capability  U)  evaluate  the  two  polynomials, 
the  control  logic  and  multiplexing  circuits  necessary  to  implement  a 4 input,  4 output  I'  FT 
Initterfly,  transversal  filter  and  the  recursive  division  algorithm  are  also  included  within 
the  element  as  discussed  prev  iously.  ( The  addition  of  two  aiiditional  output  registers  are 
required  for  the  l)UtU'rfly  and  an  iteration  counter  and  associated  logic  will  be  required 
for  the  division  algorithm). 

Since  the  l)rea(iboard  does  not  have  a self-eontained  memory,  alt  operating  (jarameters 
(|xdy.  type,  input  clata  x's  and  coetfieients)  are  stored  in  the  host  processor  and  transferred 
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U)  Ihe  C'PA  I'lemonl  as  they  are  requirod,  ami  tho  ( PA  traasfers  l>ack  the  Y's  alter  cadi 
implementation.  Two  s<‘ts  of  data  liuses  are  used: 

1)  Id  l)it  input  t)us  - computer  to  CI’A 

2)  Id  l)it  output  liu.s  - C1>A  to  computer 

alonn  with  two  control  lin«'S  to  indicate  the  status  of  the  bus. 

Three  different  formats  are  used  on  the  bus,  (Tit;.  1”);  control  information  and 

the  remaininu  two  for  the  32  bit  data  word.  The  bus  from  the  CPA  to  the  computer  is 
useii  for  data  transfer  only  while  the  computer  to  the  CPA  bus  is  used  for  control  and  data. 

The  time,  or  numiier  of  transfers,  required  to  load  the  CPA  is  dependent  upon  the  function 
that  is  to  be  implement<*<l  an<l  varies  from  a minimum  of  7 to  a maximum  of  25  word  trans- 
fers (see  lai>ie  d).  AU  even  numi.ers  transfers  are  data  1 format  while  all  odd  number 
transfers,  with  the  i-xceiition  of  the  fir.st  transfer,  are  data  2 format. 

In  order  to  transfer  <latu  from  the  ( PA  to  the  conqHiter,  an  output  request  is  generated  by 
the  computer,  th«'  2 bit  is  raisjil  nigh  followed  liy  the  poly  type.  Ihe  C PA  will  then 
respond  with  the  required  numb«T  ol  ilata  transfers,  again  dependent  upon  the  poly  typt‘. 

Tour  transf<'r.s  are  requir«sl  for  thi'  2-1  term  polynomials  (two  Y's  ami  two  transfers  [ler 
word|,  with  2 tran-sfers  riquired  for  the  •>  term  polymimial  and  the  division  algorithm.  Ihe 
fT  T requires  s transfers. 

During  the  input  loading  routine,  the  first  transfer  contains  the  o[>-code  and  is  latched  on 
the  computer  interface  ixiard.  Alt  other  transfers  are  storetl  in  one  of  two  Id  bit  parallel- 
in,  paralUd-out  registers,  ( )nce  a full  32  bit  woril  is  availalile,  (after  th<*  third,  fifth,  et< . 
transfers),  it  is  loaded  into  one  of  the  nine  buffer  registers,  (24  bits),  ami  into  the  eximnent 
(H  bits)  memory,  i he  register  that  the  contents  are  transferred  U>  is  depemlenl  ujx.n  the 
poly  type  decoded  ami  upon  the  particular  tran.sfer  taken  place,  (see  Talile  d). 

Actually,  the  buffer  registers  are  loaded  twice;  the  first  time  all  of  the  multiplicaml  terms 
are  loaded  and  after  the  last  term  has  been  loaded,  the  contents  are  shlftixl  (serial  U) 

, parallel  transfer  with  3 lines  per  register)  into  the  holding  registers  of  the  multiplier  chips. 
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I'AlU-i:  (i.  C PA  LOADINC; 


1 * 

CONTENTS  OF  BUFFER  REGISTERS  ARE  LOADED  INTO  HOLDING  REGISTER  IN  MULTIPLIERS 
DURING  THIS  TIME 

* EXECUTION  OF  PROBLEM  START  AT  END  OF  TRANSFER 
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rlu-  ri-gistt  r.s  an  Ihcn  rcloailcil  wilh  the  lau It i|ilicr  inputs  ;inil  thf  constant  (VVp,  etc.)  terms. 
The  cont«'Pts  are  then  serittllv  shiftei!  into  the  processor  ;is  n.'f|uire(l. 


1 lie  switi'h  (xiint  hetueen  multiplier  and  multiplicand  li  i'ms  is  de|)endc.nt  u|x>n  poly  type  and 
is  indicated  by  a in  I ilile  (t. 

f).  2 .MANTISSA  I -Hi  )(  T SS I N'J 

Tlu-  mantissii  portion  of  the  eh  ment  is  organi/.c’d  as  a word  parallel-hit  serial  processor 
(as  discussed  previously)  that  can  he  la -confiKured  to  perform  any  one  of  the  family  of 
multinomials. 

The  o|)cration  of  thi'  processor  is  broken  into  five  sections: 

1)  loatlini;  of  multiiilicand  into  the  holding  register  loc'atcd  within  the  multi[)lier  chip, 
(occurs  during  transfer  cycle  between  t IT  to  the  element). 

2)  first  level  multiplication  - for  all  single  product  terms  (W  j.Xj,  etc.) 

2 

3)  si-cond  level  multijilication  - lor  all  cross  product  or  .s()uare(l  terms  , etc.) 

1)  scaling  operation 

5)  final  multi|)lieation  and  summation 
A timing  budget  is  shown  in  Tig.  IS. 

5.3  KXIHiNPN  I'  PHOC'T.S.SIKC 

'I’he  ex|x>ni'nt  processor,  unlike  the  mantissa  processor,  operates  in  a word  serial,  bit 
parallel  fa.shion  for  maximum  conifiatibility  with  the  mantissa  processor  and  is  program- 
malile  to  operate  on  any  one  of  the  family  of  multinomials  as  described  in  Section  IV. 

I'he  combination  of  mantissa  and  exjionent  [irocessor  constitute  the  flo.ating-point  arithmetic 
unit  of  the-  element.  As  th»-  mantissa  processor  multiplies  the  mantissa  of  two  numbers  the 
exponent  proces.sor  adds  the  corres|x>nding  iwo  ex|x)nents  to  determine  the  exponent  of  the 
mantissa  pnaluct.  If  two  or  more  multiplications  are  performed  in  parallel  and  the  results 
added  together  (as  in  tlu-  c.ase  of  a ixilvnomial  evaluation),  all  in|iut  data  must  be  represented 
with  mantissa/exponent  normalized  to  the  largest  of  the  exponents,  furthermore,  each 
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I-'ig'.  1».  Timing  Budget  for  C'l’A  Klcment 


partial  sum  of  two  proilucts  has  to  ho  ohoc'Uoil  foi’  |Kissihlo  ovorllou.  Shoulil  ovoi  l'ow 
ocour  tho  <'orros[x)ii(liiin  oxixmo  it  has  to  ho  soali-il  acooi-ilinuly. 

rho  output  rooj.stor  hoani  porl'orms  iho  loft  justifioation  on  tlu‘  mantissa,  transfiTi'inn 
tho  numhiM"  of  ovorf low/uni lorf low  hit  iK>sitions  hack  to  tho  ox|x)iiont  |>rooossor.  When  an 
output  c’ommaiKl  is  jionoratod  hy  tho  compufor,  tho  rogisfor  hoaivl  soloots  tho  lii  st  1''  .MsP,'^ 
of  till'  mantissa  and  natos  thoin  on  to  tho  output  hus  and  thi  n si  lools  Ihi' s USH's  alonu  with 
tho  s hit  r'.\|x)nont  \iord  for  transfer  hack  to  tho  oomjnitor.  Tho  handshaking  commands  for 
tho  I <>  hus  arc  i;onorati‘d  hy  thi'  ri'Kisti'r  hoard  and  oonihinoil  with  tho  natinj;  looatod  on 
tho  oomputor  iniorfaoo  hoaa’il  to  oonfrol  lho  transfers. 

a.  4 1)1  SCHII- I'K  >N  OF  l.OCIC  DKSUiN 

rho  hasio  hrasshoai’il  oli'inont  was  huilt  around  tho  C'MoS'SOS  sorial-|)arallol  multiplioi- 
( ICS  oS'J)  with  all  of  tho  control  and  data  handlirift  loj4ic,  fo.vcc'pt  for  the  extxinent  prooo'.son, 
usinK  standiird  I'TI  parts.  The  elomont  roc|uirod  « wire-wrap  hoards  tl.")  "N  h "i  used 
ahout  2011  It  's  and  consumed  30  watts. 

rho  hoard  hroakilown  is; 
l|  timini’ 

2)  control 

3)  fl'A  intorfaco  (to  IIP  21M.X) 
f)  huffor  rogistor 

5)  mantissa  processor 
ti)  output  ri'gistor 
7)  ox(X)nont  processor 
«)  sealers 


5.3 


---  4B, 


..  I.  1 l lMtNi;  AN’l)  C'()M  HOI. 

riii'sc  lioar<ls  control  Ihc  mantissa  ixu'Uon  of  thi'  C'I’A  and  were  built  usinn  disi  rcl«'  ’r  i'l 
i( 's  in  order  to  maintain  ma.ximum  flexibility  reHardinn  bit  size,  inily  ty|u-  and  numbering 
system  (fixed  [)oint  of  floatinn  jx>int). 

I be  timiiij’  ebain  consists  of  a 110  bit  serial  shift  retfister  which  is  held  in  the  all  zero  state 
until  the  data  transfer  between  the  computer  and  the  element  has  bi-en  completed.  After  the 
last  data  word  has  l)een  stored,  a "start"  command  is  ^cenerated  by  the  inti-rface  logic. 

I his  input  is  applii-d  as  the  serial  input  to  the  register  iind  is  clocked  into  the  first  stage, 

1 be  "start"  signal  returns  to  a zero  and  the  "one"  in  the  register  is  shifted  thru  all  the 
Aages  with  various  outputs  being  applied  to  the  control  board. 

iU'side  the  start  i)Ulse,  the  timing  board  is  reset  by  the  interface  logic  prior  to  any  opera- 
tions and  also  receives  the  4 Idt  op-code  which  is  decoded  and  latched  on  the  interface 
iHiaril. 


1 he  o|)-e<Kles  used  to  control  the  multiplexers  which  by-pass  sections  of  the  timing 
hain  are  as  follows: 

a)  Singh'  or  two  level  multiplications  - the  FFT  or  transversal  filter  (P3  or  P4) 
requires  only  a single  level  multiplier  while  all  other  operations  ri'quire  two 
cascaded  multipliers;  therefore,  the  24  stages  that  control  the  operations  of  the 
second  level  multipliers  can  be  eliminated  during  the  execution  of  the  FF'l'. 

b)  fixed  or  floating  point  - for  fixed  p»)int  systems  the  24  stages  required  for  the 
scaling  operation  is  bypassed. 

c)  c bination  of  the  alwve  selects  single  level  multiplication  with  fixed  point 
ope  ation. 

I he  implementation  of  the  multiplexed  switching  for  this  timing  and  control  chain  is  shown 
in  Fig.  1!.*,  with  the  execution  time  for  the  various  conditions  shown  in  Fig.  20.  Whenever 
the  timing  chain  is  broken,  a zero  is  inserted  into  the  unused  portions,  thus  preventing 
any  "I's”  from  entering  the  unuseti  portion  of  the  registers  which  could  generate  false 
control  signals. 
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I'ho  control  board  combines  the  various  outputs  of  the  timing  chain  and  generates  system 
clocks  and  controls.  The  following  lines  are  generated: 

1)  multiplier  reset  - 2 clock  pulses 

2)  "p"  clock  for  multipliers  1-B,  7 separate  lines 

3)  sign  hold  for  multipliers  1-8,  4 separate  lines  are  generated 

4)  buffer  register  clocks  - 5 separate  lines  required  for  the  nine  registers 

5)  output  register  clock 
ti)  output  register  control 
7)  adder/latch  clock 

5.4.2  COMPUTER  INTERFACE  BOARD  (CIB) 

rhe  CIB  is  responsible  for  handling  the  16  line  input  bus  and  controlling  the  "handshaking" 
operation  between  the  CPU  and  the  CPA  element. 

The  major  functional  blocks  on  the  CIB  are: 

1)  transfer  counter  and  decode  logic 

2)  op-code  latch  and  decode  logic 

3)  serial  to  parallel  converter  - 2-16  bit  bus  to  single  32  bit  bus 

4)  multiplexer  logic  to  load  buffer  registers 

5)  burst  counter  - load  multiplier 

6)  handshaking  logic 

7)  miscellaneous  logic  to  control  various  timing  pulses 

Initially,  the  transfer  counter  is  held  in  the  zero  state,  either  by  applying  an  external 
master  reset  or  by  the  internal  reset  generated  at  the  completion  of  a cycle,  and  the  CPA 
flag  is  in  the  high  state  (not  busy).  When  the  CPU  has  data  available  on  its  bus,  it  raises 
the  control  line  thus  signaling  the  interface  board  that  data  is  available.  The  transfer 
counter  will  advance  to  state  one  and  generate  a strobe  which  will  latch  the  data 


56 


(oi>-co(le  if  it's  the  first  transfer).  Once  the  o()-co{ie  has  been  latched,  the  multiplexers 
will  l)c  set  according  to  the  "P"  type  and  thus  determine  the  sequence  for  loading  the  buffer 
registers.  It  will  also  determine  when  to  reset  the  transfer  counter  and  generate  the 
start  pulse.  See  Fig.  21  for  a load  example  for  the  6 term  polynomial. 

.'i.4.3  Bl  FFFH  HEOISTKH  BOAIU) 

I'hc  register  board  serves  as  the  intermediate  memory  required  between  the  host  computer 
and  the  mantissa  processor  of  the  CPA.  It  consists  of  8-24  bit  parallel  to  serial  registers, 
and  ()  circuits  to  perform  a I's  complement  function. 

Since  all  the  numbers  generated  by  the  computer  are  in  2's  complement  form  and  since  the 
res  03!i,  serial/parallel  multiplier  requires  that  the  parallel  word  be  a sign- magnitude 
form,  any  negative  number  must  be  converted.  Since  the  multiplicand,  the  parallel  word 
in  the  multiplier,  is  loaded  in  a serial-parallel  fashion,  (3  lines  with  8 bits  per  line),  a 
one's  complement  is  performed  on  the  negative  numbers  as  they  are  being  loaded  into  the 
registers.  This  is  accomplished  by  latching  the  sign-bit  in  an  external  register  and  using 
the  register  to  control  one  input  of  an  exclusive-or  gate.  Therefore,  for  negative  numbers 
the  output  is  inverted  while  for  positive  numbers  on  Inversion  takes  place. 

This  method,  while  being  simple  to  Implement,  does  distract  from  the  system  accuracy 
since  it  introduces  an  error  of  one  bit  in  all  negative  numbers.  In  the  future,  it  is  recom- 
mended that  a parallel  2's  complementor,  with  necessary  control  logic,  be  placed  between 
the  Interface  lx>ard  bus  and  the  buffer  register  Iward. 

The  loading  sequence  of  the  registers  is  controlled  by  the  interface  board  and  is  dependent 
upon  the  operation  being  performed.  See  Table  7 for  the  loading  sequence  as  a function  of 
"P"  type. 

5.4.4  MANTISSA  PR()CESS(4R  BOARD 

The  processor  Iward  is  built  around  the  custom  SOS/CMOS  serial/ parallel  multiplier, 

TCS  039,  and  Is  Implemented  In  a word  parallel,  bit  serial  approach  as  discussed  previously. 


57 


Fig.  21.  Ix>ad  Example  (Computer  CPA  Timing  Diagram  For  6 Term  Pol>Tiomial) 


'AI3I.K  7.  MI  L.TIPI.IKH-ADUKH  I'l  NCTIONS 
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1 lu  iv  an-  ' inultiplii'r.-'  ia-((uiri’ii  with  i.'ach  mulii|)licr  handling  two,  2 1 liit  numlH  Ts 

.ai  ni  ratiii'fi  a double  precision  product  (1«  bits).  In  order  to  achieve  the  flexibility  re(|uired 

2 

to  conliittire  tlie  ne'ttork  to  solve  various  |)olynoinials,  r“l.  and  CMOS  multiplexers  are 
provided.  ■ Once  the  "steerinii"  Ionic  has  been  established,  the  output  products  of  the 
various  multiplier  fadls  are  added,  serially,  and  stored  in  an  cmtput  ren'ister. 

file  tumdions  ca|)able  <if  beinn  handled  by  the  processor  were  tiiscussed  previously.  I'he 
meth.nls  by  whicdi  they  are  nt’oerated  will  be  shown  in  detail,  fin.  22  shows  the  circuit 
(x)sitions  of  all  mantissa  comixinents  - multipliers,  ailders  and  rnultiiilex  switches  (for 
[X)lv,  control)  interconnected  for  a Pl  while  Table  7 shows  the  functions  for  each  multiplier 
and  adder  for  each  [xdy  type.  Fig.  22  shows  the  wavefo  'ms  associated  with  a typical 
example,  as  a problem  progrtvsses  through  the  mantissa  processor. 

0.4.5  FX1’()M;n  T I'HOC'FSSOH  BOARD 

The  exixment  processor  board  operates  in  a word  serial,  bit  parallel  fashion  and  contains 
its  own  contndler  (I’HOM)  to  enable  it  to  operate  on  any  one  of  the  four  problem  types. 

The  exponent  is  an  8 bit  number  and  is  represented  in  2's  complement  format.  The  basic 
opi  ration  of  the  exptinent  processor  is  as  follows: 

f)  it  receives  and  stores  the  ordered  sequence  of  ex[x)nents  corres|x)nding  to  the  data 
X's  and  coefficients  W's.  It  is  also  instructed  aliout  the  [xilynomial  type  to  be 
evaluateil. 

2)  it  generates  and  stores,  as  rt'quired  by  the  [xdynomial  type,  any  new  exponent  of 
the  tj'pe  Xj^  or  -XjXj. 

2)  it  compares  all  exix)nents,  determines  the  largest  and  generates  the  necessary- 

number  of  scalers,  one  per  multiplier  output,  to  be  sent  to  the  mantissa  processor. 

4)  it  receives  overflow  indication  from  the  mantissa,  computes  the  correction  and 
scales  the  exix>nent  of  the  result  accordingly. 

The  operation  of  the  ex|x)nt'nt  actually  starts  with  the  first  transfer  from  C'1’1  tr  the  t'PA. 

The  sequence  of  o|)eration  will  be  reviewed  with  the  aid  of  the  tindng  diagram  of  Fig.  21,  and 
the  exfxJnent  block  fliagram  of  Fig.  24. 

•'Dependiiig  on  whether  they  appear  betwei-n  two  multipliers  (operating  at  lOv)  or  two  'T'Tl 
adders  (operating  at  5v). 
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Fig.  24.  Exponent  Processor 


► 


The  first  transfer  generates  a system  reset  which  is  used  reset  the  I’HoM  controller  luid 

miscellaneous  flip  flop's  on  the  exponent  hoard  as  well  as  other  portions  of  the  system. 

After  the  reset  has  been  removed,  the  controller  will  cycle  thru  a fixed  sequence  to  generate 

and  store  the  constants  that  may  be  required  during  the  execution  of  the  problem,  llie 

controller  will  then  switch  to  an  idle  state  until  instructed  to  advance.  At  the  end  of 

the  third  transfer,  the  X2  data  word  is  stored  in  the  computer  interface  board.  The  word 

is  split  up  with  24  bits  transferred  to  the  buffer  registers  for  use  in  the  mantissa  and  the 

remaining  S bits  transferred  to  the  exponent  processor.  'I'hls  transfer  reinitiates  the 

controller  which  stores  the  X2  exponent  in  memory  and  also  proceeds  to  generate  the 
2 2 

exponent  for  X . The  X exponent  is  then  stored  in  its  proper  memory  location  and  the 

^ 2 

controller,  again,  turns  off.  After  the  fifth  transfer,  the  X^^  and  the  Xj^X^  terms  are 

calculated  and  stored  in  memory.  The  next  transfer  contains  the  term  and  the  first 

2 ^ 

complete  exponent  term  (W^X^  ) is  now  calculated  by  the  processor.  This  result  is  then 
stored  in  three  different  locations: 

1)  the  location  reserved  in  memory  for  it, 

2)  the  memory  location  reserved  for  the  largest  exponent  term,  and 

3)  the  outboard  latch  which  provides  one  set  of  inputs  to  the  comparator. 

The  sequence  for  the  remaining  terms  is  as  follows: 

2 

1)  calculate  the  exponent  term  , etc.) 

2)  store  result  in  assigned  memory  location 

3)  compare  new  term  with  the  previous  largest  term.  The  previous  largest  term  is 
stored  in  the  latch. 

4A)  If  the  new  term  is  larger  than  previous  largest  term,  store  the  new  term  in  the 
memory  location  assigned  to  the  largest  exponent  term  and  also  store  it  in  the 
latch. 

4B)  if  new  term  Is  smaller,  halt  operations  until  next  transfer. 

This  sequence  will  be  repeated  for  all  terms  up  to  and  including  W^.  After  has  been 
processed,  instead  of  the  controller  halting  operation.  It  will  begin  a new  sequence  of 
instructions. 
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This  sequence  will  be  used  to  calculate  the  difference  between  the  largest  exponent  and  all 
other  exponent  terms  and  also  to  load  the  difference  into  the  proper  scaler.  This  is 
accomplished  by  reading  the  largest  memory  location  and  holding  the  value  in  memory  out- 
put port  A.  The  adder  stage  is  reconfigured  to  operate  as  a subtractor  and  all  previous 

2 

stored  exponent  terms  (W  X , W X X , etc.)  are  subtracted  from  the  largest  term.  The 

u ^ o X 

output  of  the  subtractor  is  then  loaded  into  one  of  the  scalers.  This  operation  continues  for 
all  terms,  and  after  the  last  term  the  controller  halts  again.  The  exponent  processor  will 
remain  idle  until  the  overflow/underflow  data  is  fed  back  from  the  mantissa. 

The  scalers  used  in  the  exponent  processor  consists  of  pre-settable  down  counters,  and 
multiplexing  logic  to  control  the  loading  sequence.  There  are  a total  of  nine  scalers,  each 
of  which  consists  of  two  4 bit  binary  counters,  with  the  pre-set  inputs  being  controlled  by  a 
common  bus  connected  to  the  exponent  output.  The  individual  load  commands  are  generated 
on  the  scaler  lx>ard  and  are  determined  by  the  "P"  type  in  conjunction  with  the  exponent 
board. 

After  being  loaded,  the  scalers  are  inhibited  until  a control  signal  is  generated  by  the  timing 
board.  This  signal  inhibits  any  multiplier  clock  pulses  from  being  generated  by  the  control 
board  and  turns  control  of  the  multiplier  over  to  the  scalers.  The  scaler  will  provide  a 
varialjle  number  of  clock  pulses  to  each  multiplier  cell  depending  upon  the  problem  being 
executed.  The  number  of  pulses  will  range  from  a minimum  of  zero,  for  the  multiplier 
cell  associated  with  the  largest  exponent,  to  a maximum  of  24.  Since  there  are  24  bits  being 
used  in  the  mantissa,  this  is  the  maximum  number  of  scaling  pulses  required  to  "zero  out" 
any  number.  Regardless  of  the  number  of  scaling  pulses  generated,  the  mantissa  processor 
control  will  remain  idle  until  a maximum  cycle  (24  clock  pulses)  is  complete.  After  this 
time  it  will  assume  control  and  generate  the  required  number  of  clocks  to  complete  the 
multiplication  and  addition, 

5.4.«  OL'TPUT  REGISTER  BOARD 

The  output  register  lx)ard  performs  the  following: 

1)  left  justifies  the  24  bit  mantissa  and  updates  the  exponent 

2)  serial  to  parallel  conversion  of  the  mantissa 


3)  ci)tnl)iiu‘s  the  H hit  exponent  with  the  2‘1  l)it  mantissa  and  transmits  the  output 
words  l)ack  to  the  computer. 

The  present  l)oard  consists  of  the  following  blocks: 

1)  2-32  bit  serial  in  - parallel  out  shift  registers 

2)  2 counters  and  associated  logic  to  locate  the  MSB's  (MSB  detector  counters  - MC) 

3)  PHOM  controller  to  control  the  output  select  and  transfer  sequence 

4)  4-1  multiplexers  and  latch 

By  way  of  describing  the  operation  of  the  output  register,  assume  that  a 6 term  polynomial 
is  l)cing  executed. 

The  serial  output  of  the  mantissa  processor  is  shifted  into  a 32  bit  shift  register.  As  the 
data  is  shifted  in,  the  present  bit  is  compared  with  the  previously  received  bit  and  if  they 
are  the  same,  a MSB  detector  counter  is  advanced  by  one;  if  they  are  different  the  MC  is 
reset  to  zero.  The  MC  is  used  to  locate  the  MSB  of  the  mantissa,  since,  depending  upon 
the  problem  the  location  of  the  MSB  can  vary  from  5 bit  overflow  to  24  bit  underflow. 

Assume  that  the  two  inputs  to  the  final  adder  stage  (adder  #2)  on  the  mantissa  are  +.  9000. . . 
and  - .89090375.  The  difference  between  these  two  terms  is:  +.00390025  (0.  OOOOOOOIOO.  . .). 
As  this  number  is  being  shifted  into  the  output  register,  five  extra  cycles  are  provided  for 
possible  overflow;  therefore,  at  the  end  of  the  processing  cycle  the  following  bit  pattern  is 
stored  in  the  register: 


R|A  R2A  R3A  R4A 


and  the  M counter  has  counted  up  to  12  (O-llOO^).  At  this  point,  the  control  logic  switches 
the  output  register  enal^le  line  and  this  latches  the  contents  of  the  M counter  (this  number 
is  then  transmitted  to  the  exponent  board  for  final  processing). 
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In  order  to  simplify  the  output  transfer  logic,  the  sign  hit  has  to  he  located  in  hit  0 of  either 
HjA,  IlgA  or  H^A.  As  shown  Ijy  the  example,  the  sign  hit  and  MSB  are  located  in  the 

middle  of  H2A.  In  order  to  align  the  hits,  the  2 l^B  outputs  of  the  M counter  is  decoded  and 
are  used  to  generate  a clock  hurst.  This  hurst  lasts  until  the  3 liiB's  of  the  M counter  are 
in  a zero  state,  and  for  the  example  shown,  four  pulses  will  be  generated.  The  M counter 
will  now  he  at  Hi  (lOOOOOg)  and  the  registers  will  be  as  follows: 


The  two  MSB's  of  the  M counter  are  decoded  in  order  to  locate  the  register  that  contains 
the  MSB  of  the  mantissa  before  the  parallel  transfer  to  the  computer.  The  decoding  sequence 
is  shown  below: 

COL'NTKH  MSB  MSB  BEG. 

0 1 R^A 

1 0 H3A 

1 1 R,A 

4 

0 0 R^A 

The  following  operations  are  performed: 

0)  the  MSB's  of  the  M counter  are  jammed  into  the  output  multiplexer  sequence 
counter  (OMC)  and  the  outputs  of  this  counter  are  used  to  control  the  4-1  mux. 

At  this  point  the  cycle  is  halted  until  an  output  request  is  received  from  the  computer. 

When  the  output  request  is  received,  a PROM  sequence  is  generated  which  performs 
the  following: 

1)  CLl  is  generated  - latching  the  8 MSB's  of  the  mantissa 

2)  OMC  counter  is  ativanced  by  one,  the  4-1  multiplexers  outputs  are  the  next 
8 MSB's. 

3)  C1.2  is  generated  - latches  8 hits. 
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At  this  (H)int  the  first  !•>  hits  of  the  mantissa  iire  availal)le,  the  ItoM  sequence  is  halted 
and  the  output  word  is  transferred  to  the  computer. 

When  the  second  output  request  is  received  the  OMC  counter  is  again  advanced  selecting  the 
s l*SH's  of  the  i7)antissa  and  then  th<>  s tiit  ex{x>nent  is  selected  and  latched.  Again,  id  hits 
are  available  h)f  transfc-r  anti  then  the  operation  is  completed. 

For  the  ft>ur  term  |^x)lynomial,  the  sequence  is  repeated  twice,  the  first  pass  uses  .M 
counter  1,  while  the  second  pass  uses  M counter  2;  4 transfers  are  required. 

5.  5 SOF  lAVAKF  SYSTFM  FOH  I'llF  HHASSBOAHl)  Kl.FMKNT 

Fig.  25  shows  the  ft>ur  major  parts  to  he  considered  in  the  operational  system  for  the  C'I’A 
element  interface  with  the  HP210S.  The  overall  control  is  provided  hy  Basic  Control 
System  (BCS)  supplied  hy  the  computer  manufacturer.  This  module  handles  iUl  I^C) 
operations  in  order  to  relieve  the  user  from  the  task  of  specifying  the  I/O  slot  of  each 
peripheral,  thus  allowing  the  user  programs  to  be  device  Independent. 

The  remaining  three  modules  were  written  hy  BCA  and  are  required  to  control  the  various 
data  transfers  between  the  element  and  the  computer.  The  first  of  these  is  "D.GO"  otherwise 
known  as  the  Drives  (listing  of  all  modules  are  located  in  Appendix  K).  Briefly  its  function 
is  to  receive  data  which  has  been  structured  and  formatted  by  the  "service  routine"  and 
make  appropriate  calls  to  BCS  in  order  to  deliver  and  retrieve  data  from  the  element. 

Control  information  is  also  providetl  to  the  driver  in  and  or  to  specify  how  many  I/O 
transfers  are  to  be  made  to  the  element.  The  driver  allows  the  element  to  operate  on  an 
— „'l  interrupt  basis,  making  it  compatible  with  other  peripherals. 

The  next  module  is  the  "Sei-vice  Houtine."  This  module  decodes  subroutine  calls  from 
the  user  program  to  determine  which  polynomial  is  ilesired.  From  this  information, 
calculations  are  made  to  determine  the  addresses  of  the  data  ptiints,  how  many  data  pt)ints 
there  are  to  be  transferred,  and  how  many  results  are  to  be  recovered  from  the  element. 
This  and  other  necessary  data  are  formatted  and  ordered  so  as  to  allow  the  driver  to 

I 
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Fig.  25.  Software  Sj'stem 


t 


tran.slVr  tiu'tn  to  the  el(Miu-nt  in  the  sliortest  possiltlc  tiino.  (At  prosont  this  system  does 
not  use  DMA:  liowever  only  minor  cliant’es  to  the  driver  would  he  re(iuired  in  order  to 
do  so). 

The  fintil  module  or  "I  ser  Program”  is  written  in  a high  level  language  in  order  to  make 
it  convenient  for  users  to  write  programs  using  the  hrassboard  element.  .At  present 
Fortran  is  used  hut  there  is  no  reason  why  Algol  or  advanced  forms  of  liasic  could  not 
he  usi'd.  In  order  to  make  use  of  the  element  the  user  must  dimension  three  arrays, 
i'wo  of  these  hold  the  input  points  (X^  and  X^)  and  f oefficients  (e.  g. , W^,  W^,  W^, 

while  the  third  array  holds  the  result  of  the  element's  calculation. 

■I  .) 

suliroutine  call  is  then  made  to  the  service  routine  "Poly”  as  follows: 

Call  Poly  (N,  1)(1),  C(l),  lt(l)) 

where  N is  the  ptilynomial  t>pe,  D(l)  is  the  first  element  of  the  input  array,  C(l)  is  the 
first  element  of  the  coefficient  array  and  H(l)  is  the  first  element  of  the  result  array, 

( This  is  where  the  answers  are  returned).  As  depicted  in  Fig.  25,  both  data  and 
coefficients  may  he  provided  on  paper  tape  or  they  ntay  he  calculated  within  the  program. 

1'  igs.  2t«  through  35  are  the  actual  logic  diagrams  of  the  hrassboard  element  as  described. 

5.ti  PHAS.Slt()AHl)  DKMONS  I'HATION 

A typical  system  implementation  is  demonstrated  by  the  program  "Array  .5”  with  the  actual 
CPA  configuration  shown  in  Fig.  3(i.  This  network  involves  15  elements,  with  elements 
1 thru  !t  l>eing  •>  term  ixilynomials  (PI)  and  elements  10  thru  15  being  4 term  polynomials 
(P2). 

Before  the  e.xecution  of  each  element  the  computer  transfers  the  required  coefficients  and 
data  [joints  from  its  memory  to  the  CPA's  memory.  At  the  completion  of  each  element's 
calculation,  the  computer  retrieves  the  "Y"  from  the  hrassboard  and  stores  it  in  the  proper 
location(s)  for  use  at  a later  time. 
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Fig,  26.  Lx)gic  Diagram  For  Timing  Board 


Control  Board 


Fig.  2«.  l.ogic  Diagram  For  Computer  Interface 


mgu.'  <3 
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Diagram  For  Buffer  Register  Board 


i-4l)  (t^) 


Fig.  31.  Logic  Diagram  For  Exponent  Data  Path 


In  onlor  to  implcnu'iit  tho  (.'ntirc’  network,  13  passes  are  required.  Nine  passes  for  each 
ol  the  <>  term  (jolynomial  and  1 ptisses  for  the  (»  - 4 term  polynomials.  Flements  10  and  11 
are  implemented  simulltmeously,  hut,  since  before  elements  13  or  14  can  be  implemented 
the  outputs  from  elements  10,  11  and  12  are  required,  element  12  must  be  implemented 
alont;  with  a dummy  element. 

I'abie  h lists  the  coefficients  for  all  the  elements  and  the  data  base  for  three  different 
examples,  along  with  the  final  output  of  the  CPA  for  each  example. 

I'he  final  answers  have  been  checked  against  an  I1P21  programmaljle  calculator  and  are 
within  0.  1',?  with  the  differences  resulting  front 

1)  one's  complementing  of  negative  numbers 

2)  truncation  of  mantissa  \ 

3)  different  internal  bit  precision  (over  30  bit  mantissa  in  calculator  compared  to 
24  bits  in  CPA). 


TABLE  8.  CPA  EXAMPLE  COM  PI  TEH  CONTliOI.l.ED 
B HASSBOAHU  E 1 .E M ENT 


PUT  WEIGHT  TAPE  IN  READER  AND  PRESS  R’JtJ 
PAUSE 


Wo  Wi  W2  W3  W5 

91073,  - .3  133 A,  - .A3225,  . 37 297 I 7 1 35E  1 , - . 1 2 3E  1 / 

5 28 A9,.2A767El,.IA05j, -.19331, -.32531EI, -.392291-1/ 
17612El,.17035El,-.A3733,-.3S132,-.A1539El,.A7A3E-2/ 
I5A3AE1 , .223A7E1,-.a337,-.A6337,.19313.-.51232E-2/ 
3126E-  1,.27  35  9E1,-.33539,-.91A31.-.3'4  735E1.-.3  33  39E1/ 
. 35 357, . 81279, .59339,. 33 53A,. 23331, .14532/ 

.1745  9,  .91132, .72919, .3335 2,  .1779,  .113 33/ 

124  94E-  1,  .31217, .531  23,  .44 735, -.13771, -.322 9 5E-1/ 

. 1 33 14,  . 33339, .25743, .97913, -.374 4 7E-1,-. 37359/ 
.76397E-2, .3 1533, .49344, . 16947E- 1 / 

4S36E- 1, .3997, .4121 5, -. 7952E- 1/ 

11335, .35 33 3, .13334, -.15413/ 

•20993E-1 , .27321 ,. 7337, .23733 E- 1 / 

.46  130E- 1, .12274, .39234, .53333E-1/ 

51354E-2,  . 1 3331  El, -.337 01, -.3  17251- 2/ 


PUT  DATA  tape  IN  READER  AND  PRESS  R’JN 
PAUSE 


I 5085753 
-.2961533 


. 1058372E1 
.3075542 


-.2535337 

.3025527 


RESULT  = -.373995E+33 

PUT  DATA  TAPE  IN  READER  AND  PRESS  RUN 

PAUSE 


.7352853E- 1 
.3101722E-1 


. 3955544E- 1 / 
. 30025  94E1 


.5764495  -.9162722  -.1952316 

-.1417897  -.2332310  .1530646 

RESULT  » -.335903E»00 

PUT  DATA  Tape  in  reader  and  press  run 
PAUSE 


. 1 1 14777 
.I2B0857E-1 


.6633397E-2/ 

.330335E1 


.6995189  .8936736  -.2145323 

.1384989E-1  -.3285279  .236497SE-1 

RESULT  » -.1141 12E*01 

PUT  DATA  TAPE  IN  READER  AND  PRESS  RUN 

PAUSE 


-.263551  1 
.6727391E-1 


-.3561959E-1/ 

.5223333E1 


. . F • dr--.'- 

tSi  fiVAi'....- 


8.3/84 


I 


SKCTION  VI 
TRADKOl’F  ANALYSIS 


The  tradeoffs  which  were  conducted  considered  three  basic  architecture  ty))es,  CSec.  \ ) us- 
ing a variety  of  developmental  or  available  parts.  These  three  types  include: 

1)  use  of  the  serial -parallel  multipliers  operating  in  word  parallel,  bit  serial  form. 
This  is  the  type  embodied  by  the  brassboard  element.  For  this  type,  two  different 
multiplier  chips  were  considered;  the  RCA,  CMOS/SOS  circuit,  (TCS  0.39,  21  bit) 
developed  at  the  beginning  of  this  project  and  a circuit  that  became  available  to- 
wards the  end  of  this  project,  the  bipolar,  schottky  T“L  (AMD  8-bit)  part.  Besides 
difference  in  bit  length  (and  a considerably  greater  difference  in  power  dissipation) 
these  devices  require  somewhat  different  interface  logic  within  the  mantissa 
processor.  ITiey  are  characterized  in  the  tradeoffs  mainly  Ijy  their  difference  in 
clock  rate  in  curves  that  follow,  with  the  TCS-039  considered  to  o)X3rate  at  10  MHz 
and  the  bipolar  unit  considered  to  operate  at  20  Mllz.  * 

2)  use  of  a parallel  array  multiplier.  The  element  performance  using  these  type 
multipliers  was  estimated  for  two  state-of-the-art  circuits  also;  the  RCA  (develop- 
mental) CMOS/SOS,  8X8  chip  and  the  'I'RVV  (MPY-IG)  bipolar  KFL,  10X10  chip. 

In  this  case,  one  complete  parallel  multiplier,  used  in  a bit  parallel,  word  serial 
fashion  is  used  as  a basts  for  the  tradeoffs.  That  is  9,  8X8  chips  for  a 24-bit  ele- 
ment, etc.  In  the  case  of  the  10x10  chip,  the  performance  was  estimated  only  for 
the  10-bit  precision  fixed  point  element. 

Further  assumptions  for  the  all-parallel  multiplier  organizations  in  the  tradeoffs 
are: 


using  an  8X8  .SOS  chip 


speed  for  8X8  multiply  100ns 

speed  for  10X10  multiply  400ns 

speed  for  24X24  multiply  900  ns 

using  the  10X10  bipolar  chip 

speed  for  10X10  multiply  200ns 


Also,  the  use  of  the  all-par;illel  8X8  SOS  chip  assumes  the  use  of  the  com|)atible 
family  of  developmental  SOS  devices  for  control,  addition,  and  register  stor:ige. 


*Each  of  these  multiplier  types  will  operate  at  somewhat  higher  speetls  (15  MMz  and  30  Mllz 
respectively),  but  worst  case  and  timing  logic  considerations  dictated  the  more  conserva- 
tive estimates. 


s.o 


3)  use  of  a microprocessor  based  element  in  conjunction  with  a parallel  multiplier. 

This  differs  from  case  2)  in  that  a high  speed  micro  (instruction  time  of  about  300ns 
with  SOS  or  bii)olar)  replaces  all  the  element  control  logic.  A low  speed  micro- 
processor based  element  (instruction  time  of  about  2 to  3 ns)  is  later  shown  in 
relation  to  the  other  approaches  for  specific  word  length  (16-bit). 

Kstimates  of  complexity  (a  measure  of  the  total  number  of  IC  packages  - custom  LSI,  or 
otherwise),  sixjed,  power  dissipation  and  total  chip  cost  were  made  with  respect  to  bit 
length.  Both  fixed  and  floating  point  were  considered  for  some  of  the  cases.  The  complex- 
ity factor  does  not  take  into  account  element  intraconnection,  element  interconnection  or 
overall  size.  These  consideration  will  of  course  affect  overall  system  cost  and  reliability 
and  gives  the  serial  tyjx;  multiplier  architectural  approach  added  advantage. 

I'ig.  37  shows  the  relative  complexity  for  different  precisions,  for  both  fixed  and  floating 
ixiinl  for  the  cases  of  the  two  types  of  serial  and  the  8X8  parallel  multiplier.  As  noted,  the 
floating  |K>lnt  exjxinent  is  4 bits  for  8 bit  mantissa  precision  and  8 bits  for  all  other  preci- 
sions. I'rom  a parts  count  point-of-view,  the  CMOS/SOS  24  bit  unit  is  generally  the  most 
efficient,  (except  for  8 bits  where  a single  AMD  circuit  for  each  multiplier  suffices).  'Ihe 
penalty  for  floating  point  can  be  seen  in  each  case.  (The  present  brassboard  is  the  32-bit 
floating  ix)int  element  using  the  10  Mllz  multiplier.) 

Kig.  38  gives  the  speed  profiles  of  the  alternate  approaches.  The  speed  penalty  for  floating 
point  in  the  ;dl-parallel  multiplier  approach  is  minor  because  parallel  shift  for  binary  point 
justification  is  assumed.  More  than  1 parallel  multiplier  can  be  employed  to  effectively 
double  the  speed  especially  for  24  or  more  mantissa  bits.  For  example,  whereas  9,  8X8 
chips  result  in  an  8 m3cc  execution  period,  18  of  them  working  In  2 independent  multipliers 
can  be  used.  Complexity  and  cost  would  however  significantly  increase. 

The  following  i)erformance  estimates  consider  only  fixed  point  elements  and  show  the  per- 
formance of  the  alternate  iUl -parallel  bipolar  multiplier  as  well  as  the  micro-processor 
based  architecture  in  relation  to  the  other  approaches,  and  also  go  on  to  consider  power  and 
cost.  One  other  Imixirtant  distinction  is  also  made.  The  succeeding  curves  consider  an 
element  with  sufficient  memory  so  that  it  can  be  multiplexed  to  form  a net  with  up  to  20 
elements  (the  multiplexing  aspect  as  discussed  earlier  In  this  report.) 
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Fig.  37.  CPA  Complexity  for  Various  F*recislons 
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The  features  of  each  system  estimated  now  are  summarized  below: 


1)  Serial -Parallel  24-bit  SOS/CMOS  multiplier  (10  MHz) 

memory  - high  speed  RAMS  for  serial  inputs  (capable  of  Implementing  a minimum 
of  20  elements) 

mantissa  - 8 serial/parallel  multipliers  with  multiplexers  to  reconfigure  the  net- 
work to  implement  the  6 term  or  4 term  polynomial  or  a FFT. 

2 

output  registers  - uses  tri-state  T L registers. 

2)  Serial -Parallel  8 bit  T^L  multiplier  (20  MHz) 

Similar  to  approach  #1  with  the  exception  of  the  multipliers  used. 

These  architectures  are  based  on  our  present  breadboard  model. 

3)  Parallel  multipliers  configuration  using  the  8X8  CMOS/SOS  multipliers 

This  approach  is  based  on  custom  LSI  devices  developed  within  RCA  and  include 
the  following: 

8X8  parallel  multipliers  (expandable)  TCS077 

18  stage  parallel  In-parallel  out  registers  TCS015 
8 stage  parallel  adders  TCSOOO 

1029  X 1 RAM  (commercially  available)  MWS5501D 
256  X 4 RAM  (commercially  available)  MWS4440D 

multiplexers  (commercially  available)  CDS4066 

4)  Parallel  system  using  TRV\  's  16  X 16  multiplier  (MPY16A)  Similar  to  the  SOS 
multiplier  system  but  incorporates  T2l  parts  beside  the  EFL  multiplier. 

5)  Micro-processor  based  system 

This  approach  is  based  on  a high  speed  micro-processor  as  discussed  in  conjunc- 
tion with  separate  hardware  multipliers,  adders,  etc.  for  use  in  signal  processing. 
Typical  of  this  approach  is  the  new  RCA  ATMAC,  or  other  bipolar  units  soon  to  be 
available. 
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The  complexity  estimates  for  these  assumptions  (Fig.  39)  include  all  necessary  control 
logic  for  stand-alone  elements.  The  previous  two  curves  were  based  essentially  on  fully 
populated  Ci’A's  where  common  control  was  possible.  It  is  seen  that  the  micro-processor 
scheme  is  advantageous  from  a total  parts  point  of  view,  while  all  other  alternatives  are 
grouped  together  for  Ifi  bit  precisions.  A somewhat  reverse  trend  is  noted  for  speed 
(Fig.  40). 

The  cost  for  the  various  T^l  parts  are  based  on  published  price  lists  while  the  cost  of  the 
custom  SOS  devices  have  been  estimated  from  between  $100  - $150  per  unit  In  small  to 
moderate  quantities  (up  to  100).  'ITie  component  costs  depend  to  a large  extent  as  to  pro- 
duction volume.  I’or  low  to  moderate  volumes,  catalog  IC's  will  always  be  substantially 
less  than  any  customized  part  such  as  the  SOS  or  EFL  multipliers  or  high  performance  de- 
velopmental micro-processors.  Fig.  41  reflects  this  trend  and  shows  that  all  alternatives 
are  close  in  cost  for  10-blt  precision  for  reasons  of  component  count  or  custom  circuit 
usage.  The  use  of  the  TRW  multiplier  (a  single  point  or  all  curves  for  10  bits)  would  result 
in  a lower  cost  element  because  it  can  be  used  with  commercially  available  TTL.  Finally, 
the  power  trends.  Fig.  0-0,  show  the  obvious  advantage  of  CMOS/SOS  based  architectures. 

In  summary,  these  curves  (Fig.  39  to  42)  are  based  on  multiplexed  or  fully  populated  CPA 
architectures  (as  opposed  to  the  curves  of  Figs.  37  and  38).  The  micro-processor  based 
element  depends  largely  on  the  technology  used.  The  pxjwer  curves  would  be  different  for 
a bipolar  device,  while  sp>eed,  complexity,  and  cost  might  not  vary  that  much.  Production 
volume  must  be  more  closely  tied  in  to  spjeclflc  estimates  based  on  developmental  LSI 
compMjnents. 

To  consolidate  this  data  and  pxit  it  in  pierspjectlve  with  a low  cost,  low  sp>eed  microprocessor 
based  element.  Table  9 was  prepared.  The  all-parallel  multiplier  organization  Is  not  In- 
cluded here  since  it  was  generally  out-pierformed  by  the  serial  approach.  Here,  we  con- 
sider average  complexity  and  spieed,  and  relative  cost.  Only  the  low  cost,  low  sp)eed  micro 
approach  does  not  have  a necessary  chip  assumed  for  its  pierformance.  This  would  be  a 
serial -parallel  multiplier  (similar  to  those  used  In  approach  1,  but  customized  for  Inter- 
facing with  the  micro).  While  the  defined  figure  of  merit  (the  lower,  the  better)  indicates 
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TABLE  9.  PERFOKMANCE  OF  CPA  ELEMENT  ARCHITECTURES 


Architecture  Type 
(for  1(5  Bit  Fixed  Point) 

No.  of  IC's 

Speed 

Cost 

Figure  of  Merit 
IC's  Speed  Cost 

1.  Serial  Multiplier 

33 

Sfis 

X 

99 

2.  High  Speed  Signal  Phroc. 
Micro.  - (16  Bit  Org. , 
300ns  Inst,  time) 
(ATMAC  or  liqulv. ) 

12 

lO^s 

X 

120 

3.  Low  Cost  Micro-Proc. 
(8  Bit  Org. , 2-3*<s 
Inst,  time) 

(1802  or  Equiv. ) 

5 

150/iS 

X/8 
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Elements  for  use  in  fully  Populated  Arrays 

1.  Uses  all  Available  Parts 

2.  Developmental;  Large  Connectivity,  All-Parallel  Circuitry 

3.  Based  on  Availability  of  Compatible  Serial  Mult.  (Under  Dev. ) 

the  best  efficiency  (if  power  dissipation  were  Included  it  would  even  show  this  much  more 
dramatically)  of  the  low-cost  micro,  the  serial  multiplier  approach,  in  today's  technology, 
is  about  the  same  in  overall  jjerformance.  It  is  more  likely  that  any  specific  implementa- 
tion would  be  chosen  on  the  basis  of  speed  requirement,  of  course. 
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SECTION  VII 


ANALYSIS  OF  FIXED-POINT 
PROCESSING  OF  NESTED  POLYNOMIALS 

7.1  INTRODUCTION 


Nested  polynomials  have  been  utilized  to  create  nonlinear  functions 
which  "model"  the  underlying  relationships  implicit  in  a "data  base" 
or  matrix  of  numerical  observations  [Ref  ^].  The  data  base  can  be 
thought  of  as  consisting  of  M rows,  where  each  row  is  an  "observa- 
tion" of  the  n + 1 variables  y,  X2»  ....  x^.  Algorithms 

have  been  established  to  "train"  or  create  a nested  polynomial 
function 

y = p(Xj^,X2,  . . • ,x^)  (7.2) 

such  that  y = y for  each  of  the  M observations  in  the  data 

'Vi 

base  (where  "="  denotes  some  criterion  of  fitting,  such  as  minimum 

— <v« 

square  error),  and  y = y is  expected  to  nola  for  any  new 
observation  (or  observation  which  was  not  used  by  the  training 
algorithm)..  This  latter  requirement  is  sometimes  referred  to  as 
"overfit  avoidance". 

The  nested  polynomial,  y = p(x,  ,x„ , . . . ,x^),  is  a composition  of 
elementary  polynomials  or  "elements".  An  element,  for  example, 
could  be  the  six-term,  two  variable  polynomial 


'"0,k  ^ '"2,k^2  '"3,k^l=^2  ^ 


*4.k='l^  ^ ’^5,k^2^ 


The  elements  are  used  recursively,  or  "nested",  to  form  the  nested 
polynomial  function. 


For  example,  a nested  polynomial  function  of  four  variables  may  be; 


This  can  also  be  expressed  by  a diagram: 


It  j'  f course,  possible  to  expand  this  function,  that  is, 
ex  it  directly  as  a polynomial  in  the  four  variables  , Xg, 

' uch  an  expansion  will  consist  of  a great  many  terms, 

computationally  inefficient  compared  to  the  nested  element 
representation . 

Recently,  LSI  hardware  has  been  developed  to  compute  nested  poly- 
nomials of  a certain  type  [Refs.  x,x].  This  hardware  uses  32-bit 
floating  point  arithmetic  but  the  design  could  be  modified  to  perform 
fixed-point  computations.  Fixed-point  hardware  is  generally 
faster,  less  expensive,  and  requires  less  power  to  operate  than 
floating  point  hardware. 

7.2  OVERFLOW  AND  ACCUMULATION  OF  ROUND-OFF  ERROR 

There  are  two  potential  hazards  of  fixed-point  computation  which 
must  be  considered:  overflow,  or  the  computing  of  a number,  x, 
outside  of  the  range  |x|  <_  1 ; and  the  accumulation  of  round-off 
error  beyond  an  acceptable  tolerance.—^ 

In  Part  1,  it  will  be  shown  that  by  observing  certain  restrictions 
while  "training"  a nested  polynomial  to  model  a given  data  base, 
the  resulting  nested  polynomial  is  guaranteed  never  to  overflow. 

These  restrictions  are  shown  to  be  very  mild  in  that  the  modeling 

— ^ Some  fixed  point  processors  omit  +1  from  the  permissible  range. 
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capability  of  the  training  algorithm  is  not  affected  for  all 
practical  purposes. 

In  Part  2,  the  accumulation  of  round-off  error  will  be  examined, 
and  in  Part  3,  the  word-length  requirement  of  a fixed-point  nested 
polynomial  processor  will  be  stated  in  terms  of  (a)  the  accuracy 
of  the  data,  (b)  the  number  of  "layers"  of  the  nested  polynomial 
to  bo  implemented,  and  (c)  the  restrictions  imposed  on  the 
training  algorithms.  Nested  polynomials  consisting  entirely  of 
six-term  elements  (as  above)  will  be  considered.  Nested  poly- 
nomials composed  of  one  or  more  different  types  of  elements  can 
be  analyzed  along  similar  lines. 

Part  1 - Restrictions  on  Model  Training  Which  Prevent  Overflow 

Triangle  inequality  — For  any  two  numbers  x and  y,  it  is  always 
true  that 

|x+y|  < jxj  + ly| 

Let  X = a + b + c and  y = -c.  Then: 

|a  + b + c + (-c)|  £ ja  + b + c|  + |-c| 

|a+bl  < |a+b+c|+lc| 

|a  + b|  - 1*^1  £ |a  + b + c|. 

This  form  of  the  triangle  inequality  will  be  used  in  what  follows. 
> Theorem  1 : 

Let  f(x)  = ax^  + bx  + c,  and  |x|  £ 1,  |f(x)[  £ A.  Then: 

la|  < 2A,  lb|  < 2A,  \c\  < A. 
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Proof : 


(1)  f(0)  = c,  lf(x)l  < A --  |c|  < A. 

(2)  f(l)  = a + b + c,  |f(x)i  1 A |a  + b + c|  1 A. 

|a  + b|-lcl  < |a*-b  + c|  _<  A. 

|a+b|  £A+|c1  £ 2A 

|a  + b|  < 2A. 

(3)  f(-l)  = a - b + c -*■  |a  - b|  £ 2A  as  in  (2)  above. 

(4)  |(a  + b)  + (a  - b) j £ |a  + b|  + |a  - b|  £ 4A 

->■  2 la]  £ 4A 

l,al  £ 2A. 

(5)  l(a  + b)  - (a  - b)]  £ ]a  + bj  + |-(a  - b)[  £ 4A  , 

|b|  £ 2A 

Theorem  2: 

Let 

f(Xi.  X2>  = Wq  + w^x^  + w^x^  + W3X^X2  + ^ ^^^^2 

and  |xj^|  £ ^2)^  — 

Then : 

|wq|  £ A.  \v/^\  < 2A,  Iwgl  £ 2A,  \vi^\  < A, 

1*4!  £ 2A.  (W5I  £ 2A. 
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Proof : 


(1)  |f(0,0)|  = |wq|  < A. 

(2)  f(x^,0)  = Wq  + + w^Xj^;  |f(x^,0)|  < A. 

Therefore  from  Theorem  1,  |wj^|  £ 2A,  | | _<  2A- 

(3)  f(0,  x^)  = Wq  + + w^x^^;  lf(0,  x^)!  £ A. 

Therefore  from  Theorem  1,  Iwgl  £ 2A,  |w^|  _<  2A . 

(4)  Finally,  to  bound  w^,  consider 

[f(l,  1)  - f(-l,  -1)]  - [f(l,  -1)  + f(-l,  1;J  = 4W3 

There  fore : 


|4w_|  < |f(l,  1)1  + |f(-l,  -1)1  + |f(l,  -1)1  + |f(-l.  1)! 


< 4A 


|wc,|  1 A. 


Theorem  3: 


Let 


f(Xi,  x^)  = Wq  + w^x^  + W2X2  + W3X^X^  + w^x^2  ^ ^^^^2 


and 


1^1 1 ,1  1.  1^2 1 - l ''ll  1 2A,  IW2I  1 2A, 


|wo|  1 A,  |w  I < 2A,  I w I < 2A. 
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Then 

(a) 

1 f(x^,  X2) 1 

< 

(b) 

In  computin 

g f 

value  than 

lOA 

lOA 

Xj  , X2 ) , no  number 
can  be  formed. 


greater  in  ab.solute 


Proof : 


X2)|  1 |wq|  + |w^xj  + IW2X2I  + |W3X^X2| 

Iw^Xi^l  + |wgX2^| 


< A + 2A  + 2A  + A + 2A  + 2A 


Therefore  |f(.x.,,  x_)|  £ lOA , and  each  term  in  the 

X ^ 

sum  is  bounded  by  A or  2A. 


Restrictions  on  Training  a Nested  Polynomial  to  Model  a Data  B^ase — 


(a)  Data  Base  Scaling:  Each  of  the  n + 1 variables  in 

the  data  base  must  be  scaled  prior  to  generating 
the  nested  polynomial.  The  scale  functions  Si  ...  S 
must  bo  monotonic  and  must  map  the  domain  of  each 
input  variable  into  [-1,  1].  The  scale  function  Sy 
must  be  monotonic  and  must  map  the  domain  of  y into 
[-B,  B],  where  B < 1.  The  purpose  of  B is  discussed 
in  (d)  below.  Primes  will  be  used  to  denote  the 
scaled  values: 


X ' 


Sj(x^) 


^2’ 


x ' 
n 


Sn^"n> 


y = Sy(y) 
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Note  that  the  domain  of  each  of  the  N + l variables 
must  be  estimated  from  the  known  occurrences  of  these 
variables  (in  the  data  base  or  elsewhere),  or  from 
known  physical  limitations. 

Once  these  scale  functions  are  known,  the  desired 
function  y = pCx^,  X2,  ...,  x^)  may  bo  realized  by 
training  a nested  polynomial,  q,  on  the  scaled  data. 

That  is,  if  y'  = q(xi',  X2 ' , ^n'),  then  the 

desired  function,  p,  is  the  result  of  the  successive 
transformations : 


y'  = q (x^'  , Xg’ x^'  ) 

y = Sy'^(y' ). 

In  addition  to  the  scaling  of  data  base  variables,  it 
is  necessary  to  scale  the  element  outputs  within  the 
training  algorithm  and  within  the  resulting  nested 
polynomial.  This  intermediate  scaling  will  be  discussed 
in  (e)  below. 


(b)  Element  Verification:  Each  element  which  may  be  used 

in  the  nested  polynomial,  q,  must  be  inspected  to 
insure  that  its  coefficients  satisfy  all  of  the 
following : 


< 

0.1 

Wil 

< 

0.2 

< 

0 

to 

W3I 

< 

0.1 

-4I 

< 

CM 

0 

< 

0.2 
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An  element  which  fails  this  test  cannot  be  used.  This 
element  verification  test  is  the  only  restriction  that 
must  be  added  to  the  training  algorithm  in  order  to 
guarantee  fixed-point  realization.  (The  scaling 
requirements  of  part  (afjjuo^^not  restrict  the  per- 
formance of  the  training  algorithm.)  The  effect  of  the 
element  verification  test  on  the  modeling  capability 
* of  the  training  algorithm  will  be  discussed  below. 

( c)  Word  Length  Versus  Training  Algorithm  Capability  - A 

Tradeoff : Theorems  2 and  3 show  the  relationship  of 

function  boundedness  and  coefficient  boundedness 

for  the  six-term  element.  This  relationship  in  turn 
suggests  a tradeoff  between  processor  word  length  and 
training  algorithm  capability,  as  larger  word  lengths 
enable  greater  scaling  (smaller  values  of  A in 
Theorem  3) , which  in  turn  make  the  element  verifica- 
tion test  less  restrictive. 

(d)  Single-Element  Model:  The  purpose  of  scaling  the 

y-data  into  a smaller  range  than  the  x-data  can  be 
appreciated  by  considering  a , single-element  model. 

We  will  choose  B = 0.0025.  Assume  that  all  variables 
in  the  data  base  span  the  range  [-1,11,  and  that  each 
number  in  the  data  base  is  exactly  expressed  in  v bits. 
Then,  in  order  to  scale  the  y's  such  that  |y'|  £ 0.0625, 

it  is  necessary  to  have  a word  length  of  at  most 
4 + V bits  (0.0625  = 1/16  = 2“"*).  Now  if  the 
element  does  not  satisfy  the  verification  test  of  (b), 
this  implies  (by  Theorem  2 , with  A = O.l),  that  the 
element  must  somewhere  assume  a value  greater  than 
0.1  in  absolute  value  within  its  domain.  Since  the 

element  is  itself  the  "model"  of  y'  = q(xi',  X2' 

Xn ' ) , then  such  an  excursion  would  imply  that  the 
rejected  element  was,  at  least  in  some  region  of  its 
domain,  more  than  60  percent  greater  (in  absolute  value) 
than  the  desired  model. 

Similarly,  if  B = 1/32,  then  5 + v bits  will  be 
required,  and  only  elements  which  are  more  than  220 

percent  greater  (in  absolute  value)  than  the  desired 

model  may  be  rejected. 

For  all  practical  purposes,  B •=  1/16  will  not  unduly 
' restrict  the  training  algorithm,  and  therefore  4 + v 

? bits  will  be  required  to  guarantee  that  overflow  cannot 

' occur. 
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(e)  Nested  Polynomial  Model:  In  general,  the  desired 

function,  q,  is  approximated  by  a nest  of  elements 
rather  than  by  a single  element.  In  this  case,  the 
"intermediate  variables,"  or  outputs  of  elements  which 
are  used  as  inputs  to  successive  elements,  must  also  be 
scaled.  This  is  because  the  ratio  of  the  range  of 
element  output  to  the  range  of  element  input  must  be 
the  same  for  each  element,  and  this  ratio  is  given  by 
B.  In  the  training  algorithm,  each  element  output 
which  is  to  be  used  as  an  input  to  successive  elements 
must  be  scaled  such  that  its  range  is  nearly  [-1,  1]. 

In  the  resulting  model,  the  inverse  scale  functions  are 
applied  to  these  intermediate  variables.  By  using 
scale  factors  which  are  powers  of  2,  the  intermediate- 
variable  scaling  reduces  to  simple  shift  operations. 


Part  2 - Analysis  of  Rounding  Errors 


The  notation  follows  Wilkinson's  Rounding  Errors  in  Algebraic 
Processes  ( Prent iss-Hall , 1963).  The  equivalence  sign  (E)  is 
used  in  expressions  of  the  form 


S 


C(ai  + 


+ e 


2’ 


a 


n 


+ 


e 

n 


) 


where  the  numbers  a,,  a_ a are  initial  data  or  previously 

computed  quantities,  and  S is  the  exact  value  of  the  computed 
quantity,  C.  We  then  try  to  find  inequalities  which  bound  the 
e's.  See  Wilkinson's  reference  to  "backwards  analysis"  for 
details , 

Rounding  Errors  in  Fixed-Point  Arithmetic  — Assume  an  arbitrary 
v/ord  length  of  t bits.  There  is  no  rounding  error  in  addition 
or  subtraction,  i.e. 

fix(a  + b)  = a + b 

f ix( a-b)  = a-b 
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In  multiplication,  the  rounding  error  depends  upon  the  rounding 
procedure  used.  Vie  will  assume  that  a 2t-bit  product  is  formed, 
2~^~^  is  added,  and  the  t most  significant  bits  are  retained. 

In  this  case, 

fix(aob)  = a*b  + e;  2^"^ 

Note  that  overflow  cannot  occur  in  multiplication,  but  can  occur 
in  addition  or  subtraction. 

Rounding  Errors  in  Computing  a Six-Term  Element  --  where 

\^l\  1 1.  1 1'  l^il  1 0.2;  i = 1,  2,  4,  5; 

iwjl  1 0.1;  j = 0,3. 

(1)  Ideal  case:  If  the  processor  has  the  capability  of  perform- 

ing (t  X 2t>)-bit  multiplication  and  accumulating  3t-bit  products, 
then  rounding  takes  place  only  at  the  very  last  step,  when  the 
3t-bit  result  is  rounded  to  t bits.  In  this  case, 

fix[f(Xj_,  X2)]  2 f(Xj^,  Xg)  + e;  |e|  £ 2~^  ^ 

(2)  Near-ideal  case:  It  is  likely  that  a t-bit  processor  would 

not  have  the  ideal-case  capability.  More  reasonable  capability 
would  be  (t  X t)-bit  multiplication  and  2t-bit  accumulation.  For 
this  case,  let  fix*  denote  the  2t-bit  result  of  an  operation.  Then 

fix[f(Xj^,  X2)1  = fix*[f(x^,  Xg)]  + 5 2~^~^ 

Note  that : 

(a)  fix*(w^x^)  = w^x^ 

(b)  fix*(w2X2)  s W2X2 
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(c) 

fix*(w3XiX2) 

= W3XJX 

2 '^3^3 

^^3  = ''’3^3- 

Since  IW3I 

< 0.1 

< 2"^, 

1^3!  1 2"^-\ 

tlien  fix*(w3x 

1X2)  = WgX^ 

^2  ^ S' 

IE3I  1 

2-^-^ 

(d) 

fix*(WjX^^) 

= w^(x^  + 

S>  = 

2 

SS  " 

ss- 

Let  E . = w . c , . 
4 4 4 

S i n ce  | | 

< 0.2 

< 2-2 

.1^4!  £ 2-*--’ 

fix*(w^x^^)  = 

w X ^ + E.; 
4 1 4 

Isl  i 

2-1-3 

(e) 

fix*(w3X2^)  = 

IK3I  1 

2-t-3 

fix[f(x^,  x^)]  = fix*[f(x^,  X2)]  - 

f(x^,  X2)  + 

or, 

fix[f(x^,  X2)]  H f(x^,  X2)  + E 

where 

|E|  = IE3  + E^  + E3  + cj  < IE3I  + |EJ  + IE3I  + |tj 

< 2~^~^  + 2“^“^  + 2~^~^  + 2“'"“^ 
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Arcunuil  at  ion  of Itoundin^  I'rror  --  It  has  been  shown  that  tlie 

round]  njt  error  for  one  six-term  I’lement  is  bounded  l)y  2 ^ in 

• the  ideal  case  and  2~^  in  the  near-ideal  case.  When  these  ele- 

i ments  :we  nestetl,  tlu,‘  rounding  error  induced  by  tiie  innermost 

elements  (i.e.,  tliosc'  in  tlie  first  ''layer”  when  the  nest  is 
‘ expressed  in  a diap.ram)  becomes  input  error  to  the  next- innermost 

elements  (those  in  the  next  layer),  and  so  forth. 

The  effects  of  input  error  can  be  assessed  by  considering  an 
arbitrary  element  with  known  input  error.  This  can  be  done  for 
the  ideal  case,  the  near-ideal  case,  or  for  other  hardware 
real  ij'.at ions  of  an  element. 

The  following  analysis  is  for  the  near-ideal  case;  that  is,  the 
case  in  which  the  processor  accumulates  2t-bit  products,  scales 
the  2t-bit  element  output  by  shifting  left  an  appropriate  number 
of  bits  (see  Part  1(e)),  and  finally  rounds  the  scaled  product 
■ to  t bits. 


•-  * . - A 


It  is  nr-cessary  to  considc-r  rounding  error  and  propagation  of  input 
error  simultaneously.  Therefore, 

Let 


f(Nj. 

X2)  = 

+ W -X  + 

11 

re  ; 

I.',! 

< 0.2 

< 

2"^  for  i 

I*jl 

< 0.1 

< 

2“^’  for  j 

Uj ! 

1 1.  1 

< 1 . 

2 ^ 2 
L "'o'^2 


1,  2,  4,  5 

0,  3 


Let 


"Z  * 


lOH 


where  ; 


lej  12^,  IB2I  1 2 


-k 


k < t + 1 


As  in  the  previous  section,  E's  will  be  used  for  rounding  error. 


(a)  fix*(w^Xj^)  5 Wj^x^  = Wj^Xj^  + w^0^  = w^^x^  + 


where  E, 


w 


iW^ll  1 2 ^2  ^ = 2 ^ 


(b)  fix+Cw^Xg)  = w^x^  + E^;  [E,^]  < 2 


-2-k 


2 2 


(c)  fix+CWgX^Xg) 


where; 


'"3^^l'^2  ^ ^1^2  ''2^1  ^^1^2^  '"3^3 


'^3^1^2  ^ ^3 


1^3!  - l'"3''l^2l  "■  ■'^3''2^ll  l'^3®1^2l  ^ l'^3^3l 

< 2~^(2”^  + 2~^  + 2”^^  + 


= 2"^(2  ^ + 2 + 2"^“^) 


loy 


J 


(d)  fix*(w^x^^)  H + E^) 


= "'4^1  ^ ®4 


where : 


|E^|  < 2~^(2-.2~^+  + 2 ^ 

= 2-2(2-^"^  - 2-2^^  - 2-^-^) 


(e)  fix*(WgX22)  = '^5^2^  * ^5’ 


|E.|  < 2'2(2“'^'^^  + 2 2^^  + 2 ^ ^). 


Therefore 


fix[f(x^,  Xg)] 


.-t-1 


5 fi!i«rf(xi.  * ^1’  I'^il  i ^ 

- f(»l.  ’‘2>  * * ^2  * ^3  * ^4  * h * H 


= f(Xj^,  X2)  + E' 


where 


|E’|  = lE^  + ^2  + E3  + E4  + Eg  + eJ 


< 2“2"^  + 2"2"^  + 2 2(2  + 2 2^  + 2 + 
2*2"2(2"^'*'^  + 2"2^  + 2"^"^)  + 2“^"^. 


= (2 


+ 2“*^"2  + 2~2^  ^ + 2“*^  + 2 + 


+ (2~^  ^ + 2"^"2  + 2 ^“^). 


Therefore  |E'|  £ 2 + 2 

— k’fl  — t 

In  this  case,  since  K£t+l,  2 12,  and 

|E’I  < 2"^'^^. 


1 
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This  result  says,  loosely  speaking,  that  if  an  element  operates 
on  data  with  K-bit  accuracy,  the  element  output  will  be  aacurate 
to  at  worst  K-2  bits,  or  less. 


Part  3 - Word  Length  Requirements 


If  we  assume  for  the  moment  that  the  input  data  to  the  nested 
polynomial  are  exactly  represented  with  v bits,  then  we  know 
(from  Part  1 ) that  4 + v bits  are  needed  to  accommodate  scaling 
and  (from  Part  2)  that  an  extra  2 bits  per  "layer"  (except  for 
the  first  layer)  are  needed  to  prevent  the  propagation  of  rounding 
error  through  successive  layers.  Thus,  for  a nested  polynomial 
of  ,i  + 1 layers,  a word  length 

b = 4 + V + 2j 

is  required  to  maintain  v-bit  accuracy  (i.e.,  error  bounded  by 

— y—  ^ 

2 ),  given  near-ideal  hardware. 

Note  that  in  general,  the  input  data  are  not  exact  v-bit  numbers, 
but  are  v-bit  approximations  to  the  true  (usually  unknown)  numbers. 
Therefore,  v-bit  accuracy  of  the  computed  result  of  a nested  poly- 
nomial cannot  be  guaranteed  (regardless  of  the  wordlength)  because 
of  the  error  inherent  in  the  input  data,  which  is  compounded  in 
successive  computations.  However,  re-examining  the  total  error 
(rounding  error  and  propagation  of  input  error)  for  a single 
element  shows  that 

fix[f(Xj^,  X2)]  = f(x^,  Xg)  + e', 

where  (from  Part  2) 


That  is,  the  error  is  made  up  of  two  components;  rounding  (2  ) 

-k+1 

and  input  error  propagation  (2  ),  If  the  wordlength,  t,  is 

sufficiently  large,  the  input-error-propagation  term  dominates. 
In  fact,  if  we  use  the  sharper  inequality 

|E'|  _<  (2“^"^  + 2~^~^  + + 2~^  + 2“^^”^)  + 

(2~^~^  + 2~^~^  + 

and  make  the  assumptions  that  k ^ 5 and  t k + 2,  then: 
lE'l  < 2~^  + 2~^~^  + 2~^~^  + 2”^“^  + 2~^~^  + 2”^“^  + 

9-2k-3 

< 2-1, 

Therefore,  setting  K = 4 + v,  we  see  that  as  long  as  our  word 
length,  t,  is  greater  than  4 + v + 2 = 6+v,  that  our  "per 
layer"  error  is  only  one  bit. 

In  other  words,  for  a nested  polynomial  having  j layers,  with 
input  data  consisting  of  v-bit  approximations  to  the  true 
(usually  unknown)  values,  and  implemented  on  a near-ideal  pro- 
cessor with  a word,  length,  b,  where 

b > 6 + v 

the  maximum  error,  e',  where 

fix[f(Xj^,  Xg)]  H f(Xj^,  X2>  + e' 


is  bounded  by 


7.3  SUMMARY 


It  has  been  shown  that  fixed-point  arithmetic  can  be  used  for 
nested  polynomial  modeling. 

The  model  training  algorithm  must  be  modified  such  that: 

• The  data  base  is  appropriately  scaled. 

• The  element  coefficients  satisfy  certain  inequalities. 

• The  intermediate  variables  are  scaled. 

Of  these  modifications,  only  the  second  limits  the  capability 
of  the  algorithm.  This  limitation  can  be  made  arbitrarily  less 
restrictive  by  increasing  word  length. 

For  models  consisting  of  6-term  elements,  where  the  data  base 
consists  of  v-bit  approximations,  the  near-ideal  processor 
having  a word  length  of  at  least  6 + v bits  has  a combined  error 
(roundoff  and  propagation)  maximum  of  one  bit  per  layer. 
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Si:CTION  VIII 


SIMULATIONS  OF  NONLINFAR  MULTINOMIAL  NF.TWORKS 
USING  FIXFD-POINT  CI’S's  WITH  VARIOUS  WORD  LliNGTIIS 


8 . 1 SUMMARY 


The  major  objecti\-o  of  the  work  covered  in  this  section  was  to 
investigate  via  digital  compiiter  simulations  the  generation  and 
propagation  of  errors  caused  by  ust:  of  fixed-point  arithmetic 
with  finite  word  I'mgth  data  and  finite  arithmetic  opera- 
tions in  nonlinear  multinomial  networks.  Subsidiary  objectives 
were  to: 

• Investigate  the  advantages  of  "near-ideal"  or  double- 
precision element  computation. 

• Study  the  effects  of  intermediate  element  scaling 
by  shifting. 

• Verify  the  theories  proposed  in  tlje  previous  section 
with  respect  to  worst-ca.se  error  analysis  for  a fixed- 
point  network. 

0 Determine  the  dynamic  range  of  a fixed-point  network 
as  a function  of  both  word  length  and  the  number  of 
network  layers. 

The  major  conclusions  r(?aclied  in  this  section  are: 

• The  theoretical  derivation  in  Section  VII  is  substantially 
confirmed  by  the  simulation  results,  wiiich  reveal  loss 
propagation  of  errors  than  anticipated. 

• A four-layer  network  will  yield  an  accurate  dynamic 
range  of  three  decitnal  digits  with  16-bit  fixed-point 
computation . 

• By  employing  8-bit  words,  a network  having  two  (and  perhaps  up 
to  four)  layers  could  accurately  resolve  a binary  solution. 

— ..  I • "Near-ideal"  element  computation  can  improve  network 

accuracy  as  much  as  23  decibels  (more  than  one  decimal 
digit),  compared  to  standard  single-precision  solutions, 
with  little  or  no  added  hardware. 

• Intermediate  element  scaling  does  not  improve  the 
dynamic  range  of  network  outputs. 
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8.2  SIMULATION  PItOCEDUHE 


8.2.1  N [•;  T IVO  R K S T R U C T (1 R (i  A N'  I ) S I 2 !•; 


A four-layer  "triangular"  network  having  15  elements  and  16 
unique  inputs  was  used  throughout  the  course  of  the  simulation. 
Figure  4.3  illustrates  this  structure.  Note  that  the  output 
of  any  element  never  feeds  more  than  one  element  in  the  next 
layer.  This  triangular  configuration  was  selected  on  the  basis 
of  generality.  In  fact,  any  feed-forward  network,  no  matter 
how  intricate  its  connectivity,  can  always  be  expanded  to  the 
triangular  form. 


Each  of  the  elements  was  a si.x-  term  quadratic  function  of  two 
inputs  expressed  as: 


y 


w + 
o 


WfXf 


'"2^2 


W3X1X2  + 


2 ^ 
W4X1 


W,-X 

5 


2 

2 


where  x^  and  Xg  were  the  element  inputs,  Wq  through  w^  were  the 
weighting  coefficients,  and  y was  the  element  output.  All  w’eigh 
ing  coefficients  and  network  inputs  were  selected  at  random. 


8.2.2  GENERATION  OF  WEIGHTING  COEFFICIENTS  AND  INPUT  VALUES 
(W's  AND  X's) 


The  weighting  coefficients  were  bounded  by  the  following  limits: 

-.1  < Wq  < +.1 

-.2  < Wj^  < +.2 

- . 2 < Wg  < + . 2 

-.1  < W3  < +.1 

- . 2 < w . < + . 2 
• 4 

-.2  < w^  < +.2 
5 

where  the  inputs  never  e.xceeded  the  domain 

-1.0  < X < +1.0 


These  restrictions  are  those  prescribed  in  Section  7,  and  will 
always  guarantee  fractional  element  outputs  in  the  range 

-1.0  < y < +1.0 

prior  to  intermediate  scaling  of  the  y's. 

The  w's  and  x's  were  uniformly  distributed  psuedo-random  numbers, 
generated  by  a HP-25  (Hewlett-Packard)  programmable  calculator 
with  the  following  algorithm: 

- FHAC  j^(7T  + 

where  FRAC  represents  the  fractional  part  of  the  bracketed  quan- 
tity. The  seed  of  the  sequence,  was  equal  to  +.123456789. 

As  the  numbers  were  produced,  they  were  mapped  into  their  appro- 
priate domains  and  truncated  to  five  decimal  places. 

fables  10  and  11,  respectively,  give  the  values  for  the  weighting 
coefficients  of  the  15  elements,  and  values  of  the  four  input 
vectors  used  throughout  the  simulation. 

8.2..1  CMARACTKillSTICS  OF  THK  DICITAl,  COMPUTFR 

The  simulations  were  performed  on  an  Electronic  Associates,  Inc. 
EAI-640  Digital  Computing  System.  This  machine  is  a parallel 
binary  computer  which  operates  with  a fixed-point,  16-bit  instruc- 
tion and  data  word.  The  FORTRAN  IV  compiler  allows  six  modes  of 
internal  data  representation.  Appendix  H gives  the  exact  format 
of  these  modes.  The  x's  and  w's  were  stored  in  the  Scaled 
Fraction  mode,  allowing  for  a 15-bit  mantissa,  plus  sign  bit. 

A convenient  feature  of  the  compiler  was  the  allowance  of  "In-Line 
Assembly  Coding,"  permitting  both  FORTRAN  and  assembly  language 
statements  to  be  interspersed  throughout  the  program.  FORTRAN 


tabu:  ]() 

RANDOM  NETWORK  COEFFICIENTS  FOR  THE  15  ELEMENTS 


aEMENT 

'I9.  'll 

1 

-.08721  .14660 

2 

.03765-. 05703 

3 

-.09765-. 15426 

4 

.08773-. 08215 

5 

-.07302  .10504 

6 

-.05569  .14108 

7 

.051 76- .05493 

8 

-.06631-. 19671 

9 

.00057  .18045 

10 

.04821  .02874 

11 

.09918  .01007 

12 

-.04791-. 00900 

13 

.02 722-. 04660 

14 

-.05676-.1 1660 

15 

•00390  .01510 

. 02676-. 003 63-. 01220-.  19320 
.00134  .09677  .045 1 9- . 1 9662 
.15460  .06271-. 01232  .07394 
.17340-. 09078  . 1 235 9- .0S2C3 
.14359  .03109  .08444  .10183 
. 12103- .07058  .01 190-. 17672 
.01419  .06967  .12957  .08815 
.19857-. 04150  . 1 0 1 68- .0 1 269 
.08052  .053E3-.09912-.1  U64 
. 12C69-.04S12-. 10702-.09594 
.11724  .01608-. 1 1074-. 12133 
.15225  .03347  .19409  .15545 
.07095-. 01275  .16305  .03170 
.16220  .04769  .08697  .12951 
.06530  .04293-. 15029  .00854 
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TABLE  11 

THE  FOUR  RANDOM  INPUT  VECTORS  USED  IN  SIMULATION 


Record 

Xi.Xs.Xg.XjS 

’^2’^6’’^10'^14 

X3,X7,Xii,Xi5 

X4 ,X8 ,Xi2 

1 

-.26338 

.38124 

.0915 

.74762 

.73512 

.53310 

.40036 

.73922 

-.16192 

-.37131 

,96073 

.83810 

-.97022 

-.31728 

.10235 

.37298 

2 

.86365 

.39380 

-.38857 

.71173 

.50089 

-.86367 

.38234 

-.79538 

-.59514 

-.67376 

-.68517 

.52935 

.03540 

-.75119 

.20782 

-.72102 

3 

-.47227 

-.97145 

-.92705 

-.59671 

.34433 

.59799 

-.64500 

.61461 

-.51813 

.59595 

.89486 

-.73471 

-.37323 

-.40664 

.01613 

.06401 

4 

-.90303 

-.25230 

.81302 

-.86339 

.53361 

.98970 

.95420 

-.56541 

.08380 

-.78364 

-.07408 

-.01327 

.’18673 

.64049 

-.85137 

.93473 

'P 
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was  used  as  much  as  ijossiblo;  however,  it  was  necessary  to  write 
a few  assembly  language  routines,  including  double  precision  add 
and  multiply  systems  programs  and  variable  bit  mask  programs  to 
perform  the  "near-ideal"  method  of  element  computation  at  various 
word-lengths.  A listing  of  the  640  instruction  repertoire  is 
given  in  Appendix  H. 

8.2.4  COMPUTATION  OF  THF  "NEAR- I DEAL"  ELEMENT 

A method  which  greatly  minimizes  the  effects  of  multiplication 
roundoff  error  during  the  computation  of  a six-term  element  has 
been  devised.  It  has  come  to  be  named  "near-ideal"  element 
corgputation  because  its  employment  produces  "near-perfect" 
element  outputs  when  compared  to  their  true  values.  In  essence, 

the  method  simply  performs  the  multiplication  and  addition  of 

# 

the  x’s  and  w’s  using  double-precision  arithmetic.  After  the  six 
terms  are  summed,  the  double-precision  result  is  rounded  to  single 
precision . 

A result  of  these  simulations  shows  that  near-ideal  element 
computation  reduces  the  average  absolute  error  as  much  as  23 
decibels  (more  than  one  decimal  place!)  in  the  network,  when 
compared  to  standard  single-precision  element  computation,  where 
rounding  takes  place  after  every  multiplication. 

Figure  44  illustrates  how  "near-ideal"  elements  were  computed 
in  this  study.  The  letter  t represents  the  word  length  in  bits 
of  the  x's  and  w's.  A multiplication  of  two  t-bit  quantities 
produces  a 2t-bit  result. 

Instead  of  rounding  the  products  of  the  w^^x^  and 
their  exact  products  were  accumulated  in  a double-precision 
(2t-bit)  summing  register.  For  the  terms  involving  two  multi- 
plications (WgX^x,,,  w^Xj^Xj^,  and  w^XgXg),  a 3t-accumulator  and 
a 2t-by-t  multiplier  would  be  required  to  represent  exactly 
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these  products.  Instead,  in  the  "near-i  d('al " case,  the  partial 
term  products  and  x^x^  were  rounded  to  t bits,  then 

multipJ'ied  by  their  ajjproj^riate  coefficients  to  produce  2t-bit 
products.  These  double-precision  quantities  were  then  accumu- 
laifecf-  an  the  summing  register.  The  t-bit  Wq  was  directly 
accumulated.  Following  the  addition  of  the  six  terms,  the 
double-precision  result  was  rounded  to  t bits,  yielding  the 
"near-ideal"  element  outi>ut. 

The  rounding  operations  were  performed  by  adding  one  to  the 
(t+l)-bit  position  of  the  double  precision  quantities,  then 
truncating  the  result  to  t bits. 

8.2.5  REPRKSKNTING  FEWER  THAN  16  BITS  WITH  A 1 6 - B IT  COMPUTER 

¥ 

One  objective  of  this  study  was  to  investigate  errors  produced 
in  the  network  as  a function  of  computational  word  length. 

It  was  therefore  necessary  to  simulate  arithmetic  operations  for 
various  specified  bit  precisions.  The  word  lengths  selected 
were  16,  14,  12,  10,  8,  and  6 bits. 

These  word  lengths  were  simulated  by  masking,  (or  truncating), 
the  x's  and  w's  before  and  after  they  were  involved  in  an 
arithmetic  operation.  The  double-precision  product  formed  in 
a multiplication  operation  was  masked  to  2t  bits. 

8.2.6  COMPOSITE  AND  COMPUTATIONAL  ERRORS  DERIVED  FROM  TRUE  VALUE 

To  compute  the  error  generated  in  the  network  as  a result  of 
finite  word  length  arithmetic  operations,  a "true"  value  for  each 
of  the  15  elements  was  computed.  Using  the  double-precision  mode 
of  the  EAI-640,  the  IG-bit  scaled-fraction  x's  and  w's  were  con- 
verted to  62-bit  floating-point  values  (53-bit  mantissa  plus  sign 
and  7-bit  exponent  plus  sign).  A separate  subroutine  was  written 
to  compute  the  entire  network  in  the  floating-point,  high-precision 
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jiiodi',  wliilf  anolliiT  .s(’c.lion  of  1 lu;  )))-o}Tr:im  r.oinputed  tlu;  network 
usint;  shor  L-iuan  I i ss.i , fixed-point  arithmetic  to  the  specified 
nuni'aer  of  hits.  A comparison  of  tlu'  f ixcd-pi.)  i n t c?leinent  cjutputs 
to  tlie  doul)  le-ijn 'c  i s ion  eJeanent  outputs  indicated  the  degree  of 
ei’ror  introduced  by  t hc'  f ix('d-p(ji  nt  o])erations. 

Two  types  of  (?rrors  can  in'  generated  and  propagated  through  tin,- 
network  — those  causi-d  by  j'ound-ofi  after  an  arithmetic  cf)mj)u- 
tation,  and  I hose-  duo  to  finite  I’C'presen  tat  i cnis  of  tlie  inr>ut 
data  to  lie;  netw'ork.  The  lOrmer  will  be  tei'iiied  "compui  at  ionai " 
error,  and  the  latt(>r  "ininit  data"  error.  The  name  "composite" 
(■r;or  will  l)0  given  to  tliosc'  quantities  containing  both  coinjjuta- 
t ionai  and  input  data  erroi’s. 

Both  the  x's  and  w' ' s are  receivi-d  witli  a cn'i'tain  degree  of 
error,  1 rom  the  point  of  view  tliat  tiiose  digital  quantities  are 
api^roximate  represen  tat  ions  of  continuous  functions. 

figure  i .S  shows  tlie  procedure  used  for  finding  the  composite 
ei  roj'  and  the  computational  error  in  the  fixed-point  network. 
Ncjtice,  in  the  upper  diagi'am,  that  the  input  data  (x's  and  w's) 
used  to  calculate  the  true  value  remain  fixed  at  16  bits  wlien  the 
input  data  to  the*  fixed-point  network  are  masked  to  t.  bits.  In 
tlu*  lf)Wf,r  diagram,  notice  that  the  input  data  to  botli  the  fixed- 
point  and  Gll-bit  floating-point  networks  are  always  accurate 
to  the  same  number  of  bits. 


S.2.7  INI  liKMIiDIATk  i:  1.1;MI;NT  SCAflNCl 

An  inherent  restriction  imposed  on  the  network  when  using  fixed- 
point  arithmetic  is  that  the  coefficients  of  the  elements  must 
be  limited  to  a fractional  range.  The  theoretical  bounds  of 
these  coefficients  are  established  in  Section  VII.  Although  these 
bounded  ooef  f ic it;nts  guarantee  the  element  outputs  to  be  frac- 
tional, thc'ir  over-all  effect  is  to  decrea.se  the  dynamic  range 
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(a) 


(b) 


- fixc'd-point  element  outiiuts 

- "true"  element  outputs 


HIGURi;  45;  PHOCKilUHE  I'OR  I’lNDTNG  (a)  COMPOSITE  EKHOR 
AND  (b)  COMl'UTATIONAI,  ERROR 
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of  the  element  inputs.  In  one  example,  the  average  absolute 
value  of  the  x's  inputted  to  the  network  was  0.50.  The  average 
absolute  value  of  the  first-layer  element  outputs  had  dropped 
to  0.12.  After  four  layers,  the  average  absolute  value  had 
dropped  to  0.009.  If  a network  were  constructed  having  a few 
more  layers,  the  result  would  have  soon  converged  to  zero. 

To  compensate  for  these  ever  decreasing  element  outputs,  a 
method  of  scaling  the  intermediate  y’s  by  shifting  left  was 
investigated.  Shifting  a binary  fraction  one  place  left  is 
equivalent  to  multiplying  by  two.  Results  were  genera. ed  for 
the  network  with  the  intermediate  outputs  shifted  zero,  one, 
and  two  times.  It  was  found  that  shifting  more  than  once  would 
•cause  overflow  at  some  of  the  element  outputs,  which  In  turn 
caused  a rather  large  error  to  propagate  through  the  net.  The 
following  subsection  presents  the  results  of  these  scale-by- 
shifting  operations. 
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8.3  SIMULATION  RESULTS  AND  ERROR  ANALYSIS 


This  section  presents  a graphical  display  of  the  average  and 
raaxlmunn'-elrrors  generated  and  propagated  in  the  simulated  fixed- 
point' networks , where  the  average  error  f.or  a given  layer  was 
calculate4'"for  tlie  ensemble  of  outputs  from  that  layer. 

Figures  46  (a-f)  show  the  average  absolute  composite  error  and 

maximum  absolute  composite  error— as  a function  of  computational 
word  length  for  three  cases  of  intermediate  scaling:  shift  = 0 

(no  scaling),  shift  = 1,  and  shift  = 2.  Figures  46  (g-1)  illus- 

trates average  absolute  computational  error  and  maximum  absolute 
computational  error  as  functions  of  computational  word  length 
for, the  same  three  shift  values.  Each  individual  figure  shows 
the  averagfe  absolute  error  (either  composite  or  computational) 
observed  after  each  layer  in  the  network  (solid  lines),  and  the 
average  absolute  value  observed  after  each  layer  (dashed  lines). 

The  terms  L=1 , L=2,  etc.,  refer  to  the  layer  numbers  for  the 
average  absolute  values.  All  twelve  plots  were  dei'ived  from  a 
network  employing  "near-ideal"  elements. 

The  number  of  available  data  points  from  which  the  plots  in 
Figure  46  were  derived  varied  from  layer  to  layer.  The  number  of 
points  available  was: 

Layer  Number  of  Data  Points 

1 32 

2 16 

3 8 

4 4 

The  variation  was  due  to  the  triangular  structure  of  the  net.  It 
follows  that  the  ‘statistical  significance  of  the  simulation  results 
is  greater  for  layer  1 and  dimini-shes  with  each  successive  layer. 


1/ 


The  maximum  absolute  error 
a given  simulation  trial. 


was  the  largest  error  observed  for 
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The  purpose  for  plotting  the  average  absolute  value  on  the  same 
figure  with  the  absolute  errors  is  to  indicate  the  approximate 
dynamic  range  of  the  element  outputs  for  the  various  layers. 

Notice  for  all  the  curves  in  Figure  46,  except  those  labeled 
"shift  =2",  that  the  error  at  the  least  significant  portion  of  y 
is  a function  of  the  computational  word  length,  but  the  average 
absolute  value  at  the  most  significant  portion  of  y is  a function 
of  the  layer.  The  approximate  dynamic  range  of  y is  equal  to 
the  average  absolute  value  minus  the  absolute  error.  For  example, 
from  Figure  46  (g)  it  is  seen  that  with  16  bits  of  computational 

word  length  the  average  dynamic  range  for  a four-layer  network 
is  about  60  decibels,  or  three  decimal  digits.  The  same  network 
computed  with  8-bit  words  yields  about  10  decibels  for  the  average 
dynamic  range.  This  is  better  illustrated  in  Figure  47,  where 
the  average  dynamic  range  observed  after  each  layer  has  been 
plotted  as  a function  of  the  computational  word  length.  This 
figure  has  been  derived  from  Figure  46  (g)  by  subtracting  the 

average  absolute  error  from  the  average  absolute  value.  Notice 
that  as  the  number  of  layers  increases,  the  dynamic  range  of  the 
network  decreases.  A similar  plot  could  be  derived  from  the  • 
maximum  absolute  error  versus  word  length  to  find  the  "guaranteed" 
dynamic  range. 


The  effect  of  shifting  the  element  outputs  one  place  left  is  shown 
in  the  middle  column  of  Figure  46.  (Those  figures  labeled 
»shift  = 1".)  Altliough  this  scaling  procedure  increased  the 
Average  absolute  value  of  the  element  outputs,  it  also  increased 
the  error  by  an  equally  proportionate  amount.  The  magnitude  of 
the  number  is  multiplied  by  a factor  of  twcJ,  but  the  error  is  also 
moved  into  a more  significant  bit  position,  causing  a zero  net 

I 

Increase  of  dynamic  range.  Intermediate  element  scaling  is  there- 
fore not  necessary  unless  the  layer  outputs  are  required  to  main- 
tain a fixed  numerical  value. 
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Tlu'  results  in  tlU'  t h i ?•(!  column  of  I'i^'iiu*  « . 1 (lahelc'd  "shitt.  - 2") 
illustrate  t lu'  elieets  of  ovf'ftlovv  (>rror  t- > . In  t.h(?se 

fi^xv^r('s,  all  network  ouLjJUts  wa're  shifted  two  placMJS  left  (eciuiva- 
lent  to  mu  1 t i p 1 y i lift  by  four)  bt'fore  bt?inft  pjano^ssed  thiajufth  Uk; 
next  lay(>r.  This  scalinj^  was  loo  lartie  for  some  of  the  inl(>r- 
mediat*'  oulpuls,  causinji  them  to  be  saturated  at  0.9999,  henc.e  , 
larm>  errors  were  propajtated.  The  ^uarant<‘(>d  dynamic  rantte  of 
the  network  thus  dropped  below  zero  decibels,  as  indicated  by 
Figures  8.4  (f  and  1).  As  seen  by  these  diagrams,  the  effects 
of  overflow  can  be  quite  severe. 

In  ccjmparinR  the  composite  error  to  the  computational  error,  a 
difference  of  approximately  7 decibels  existed  for  all  word 
lengths  of  the  first  and  second  layer  elements,  except  for  16 
bits,  where  these  values  should  be  equal.  This  difference  was 
about  10  decibels  in  the  third  layer,  but  dropped  to  about  3 
decibels  in  layer  four.  This  additional  error  was  caused  by  the 
masked  x's  and  w'.s  inputted  to  the  fixed-point  net.  The  decrea.sed 
error  after  the  fourth  layer  is  due  to  the  statistical  property 
of  the  random  network,  which  caused  the  fixed  output,  "y,"  to 
approach  zero. 


The  composite  error  may  not  accurately  reflect  the  exact  level  of 
error  to  be  expected  with  a network  in  which  the  coefficients 
are  optimized  by  a numerical  search  procedure. 


An  Interesting  observation  made  with  the  fixed-point  network 
simulations  is  that  the  error  did  not  propagate  from  one  layer  to 
the  next.  In  fact,  the  error  (both  average  absolute  and  maximum 
absolute)  after  the  fourth  layer  was  generally  smaller  than  that 
of  the  previous  layers.  This  can  be  explained  by  the  fact  that 
random  errors  can  be  both  positive  and  negative,  and  since  the 
final  network  output  represents  the  summation  of  many  thousands 
of  terms,  the  probability  that  these  errors  cancelled  one  another 
is  high.  Also,  when  using  fixed-point  arithmetic,  there  are  no 
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■ xiiononts  in  the  numl)ers  tcj  ;iu  Lorrui  L ira  1 1 y scale  errofs  Lu  hif^her 
sit^ni  r icant  positions.  The  c'rrors  therefore  tend  to  rt'ma  i n con- 
fined to  tlie  least  significant  bits. 

As  further  verification  of  the  findin^K  of  this  study,  the  w's 
and  .X ' s were  all  t^iven  positive  polarities  in  additional  trials 
to  assure  that  the  network  output  would  be  a posit  i\-e  number. 

The  results  were  essentially  identical  to  those  .just  dc’scribed, 
except  that  the  first  layer  errors  were  about  6 decibels  lartter 
for  the  composite  errors,  and  about  d decibels  lar;ier  for  the 
computational  errors  (compared  with  the  result  shown  in  Figure 
8.1).  Also,  with  input  x’s  belny  positive,  the  element  outputs 
were  larger  in  magin-tude  and,  hence,  shifting  could  not  be  per- 
formed without  overflow. 

Figure  8.6  shows  the  theoret  ical  maximum  composite  errtjr  as  a 
function  of  word  length  for  each  layer  in  a four-layer  network. 

This  diagram  was  produced  from  the  equation  shown  on  the  1 igure 
which  was  derived  in  Section  7.  The  theoretical  error  for  layer 
one  is  almost  identical  to  the  maximum  computational  error  obser'-ed 
in  the  simulations  (Figure  8.4-h),  but  is  about  five  decibels 
(one  bit)  less  than  the  observed  maximum  composite  errr>r  (Figurt> 
8.4-b).  This  difference  of  about  one  bit  is  believed  to  have  been 
caused  by  the  computer  system  software  when  converting  input  data 
from  .scaled-fraction  to  double-precision  mode.  Consequently,  the 
observed  first  layer  results  of  the  figures  showing  ccjmjjutat iona  1 
error  (Figures  8.4,  g-1 ) may  be  (at  most)  one  bit  larger  than  the 
actual  error. 

It  is  also  seen  from  Figure  8.6  that  the  composite  error  increases 
as  the  number  of  layers  increases,  due  to  the  propagation  of  the 
feed-forward  input  error.  The  maximum  error  propagation  is  shown 
to  be  21  decibels  between  the  outputs  of  Layer  1 and  Layer  4.  The 
simulation  results  have  indicated,  however,  that  error  cancellation 
occurs  if  the  computational  errors  distribute  themselves  in  a ran- 
dom fashion.  Hence,  the  actual  error  propagation  between  layers 
can  be  considerably  less  than  the  expected  worst  case. 
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heorctical  Maximum  Composite  Error  vtlB) 


1 


Tliis  lii;ur<‘  wa^.  lioriva'd  t roni  the  eciuatioii  |E'  1 ^ 2 ^ ^ + 2 

on  116,  ami  represents  "near- ideal"  computation. 


0 


d 12  10  8 6 

Word  Leupth  (Bits) 

ICAL  MAXIMUM  COMI'OSITE  ERROR  FOR  "NEAR- 
HARDWARE . 
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Sinci'  t ho  t liooreL  i cul  'icT  i v:il  ions  ol'  Section  VII  ure  h.iscil  on  a 
"worst  cast;"  analysts,  it  is  not  surpri.sinfc  to  set;  a somewhat 
sizeable  (Iiflert;nce  botwts'n  the  t ht;t)  rot  i c.a  1 and  obstivod  maximum 
errors  of  the  sticond,  third,  and  fourth  layers.  For  a nc.-t.work 
having  non-random  coo  f f i c ion  t .s  , and  with  non-random  inputs,  it  is 
not  known  how  the  errors  will  distribute  themselves,  but,  such 
a network  should  be  a prime  con.si  dorat  ion  in  future  simulations. 

In  comparison  to  the  random  nt;twork  studied  herein,  the  theoretic.  i 
findings  tif  Section  VII  are  seen  to  be  conservative  with  respect  to 
error  propagation. 


8. ‘I  CONCLUSION’S  AND  HKCX.'MMKNDATIONS 


Tlu-  following  coticlusions  Imvo  been  reached  as  a result  of  tfie  simu- 
lation described  in  tliis  section. 

• Thi'  theore  t i c-al  1 y-dc  ri  ved  conclusions  jiresented  in  Section  VII 
have  been  substantially  confirmed,  except  that  in  ttu'  simu- 
lated case  the  worst-case  error  |)ropagation  was  not  observed. 

It  is  felt  that  this  more  favorable  outcome  resulted  from 
cancellation  of  random  errors  within  ttie  simulated  network. 

• For  applications  requiring  only  a binary  di .scrimi nat  ion  , as 
in  tlie  case  of  a pai  rwi  se- vot  mg  classifier,  eiglit  bits  would 
be  sufficient  for  computing  a two-layer  (shallow)  network, 
and  jjerliaps  a three-  or  four-layer  network,  l)ut  fewer  than 
eight  eight  bits  is  not  recommended  for  any  application. 

• A four-layer  (moderately  deep)  network  of  six-term  nonlinear 
multinominal  elctments  has  an  accurate  dynamic  range  of  tliri'e 
significant  decimal  digits  if  computation  is  performed  witli 
16-bit  fixed-point  ”n€?ar  ideal"  arithmetic.  Smaller  networks 
yielding  greater  precision,  or  larger  networks  yielding  loss 
precision,  are  also  possible  with  16-bits.  For  applications 
requiring  greater  accuracy,  a larger  work  length  will  be  needed. 

• D(h?p  networks  of  up  to  eight  layers  should  be  realir.ed  using  at 
least  a 24-bit  mantissa  to  obtain  ll)ree  significant  decimal 
digits  of  resolution  at  tlie  network  output. 

• "Near-ideal"  element  comi)utation  imiiroves  the  network  accuracy 
as  much  as  23  decibels.  (Implementation  of  this  type  of  arith- 
metic requires  little  or  no  additional  hardware  compareck  with 
the  standard  method  of  computation.) 

• If  no  intermediate  scaling  is  performed  in  a CPA  network,  the 
expected  range  of  tht'  element  outputs  decr«>ases  as  the  numbc'r 
of  layers  increases.  More  than  20  decibles  of  attenuation 
have  been  observed  going  from  the  fir.st  layer  output  of  the 
network  to  the;  fourth  layer  output.  However  this  decrease  is 
not  a great  hindrance;  it  can  be  mitigated  at  the  Itsist 
significant  portion  of  the  element  outputs  by  keeping  error 
propagation  at  a minimum.  In  fact,  the  errors  observed  in 
the  fourth  layer  outputs  were  almo.st  always  smaller  than 
errors  observed  at  the  previous  layer.  (Without  shifting, 
the  com])Utat  iona  1 c-rror  remains  confined  and  is  not  moved 
into  higher-ordt;r  bit  position.)  It  was  found  that  the 
dynamic  range,  or  number  of  significant  figures  of  the 
element  outputs,  remained  constant,  with  or  without  inter- 
mediate scaling. 

• Propagation  error  caused  by  intermediate  element  overflow 
was  observed  to  be  quite  severe  on  the  final  network  output. 
Krrors  of  this  typ<*  tend  to  have  a "snowballing"  effect  and 
should  be  avoided. 
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♦ 

It  is  r(‘Ct>miTH'iuiocl  that  tho  simulation  rosults  bo  verified  usinp 
simulations  of  a f i x(‘d-po  i n t network  whose  coefficients  are 
selected,  or  "optimized,"  from  truininp  data.  The  statistical 
distribution  of  errors  for  such  a case  may  not  be  as  random  as 
we  have  seen  witti  the  contrived  network,  and  hence  may  cause  a 
greater  error  propattation  throush  tho  layers. 
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LINI'.AR  AND  NONI.INiiAK  I 1,  T I,  l<  I NC,  l\  I Til  Cl'S's 
9.1  I_NT  INDUCTION 

The  architecture  of  configurable  polynomial  arrays  is  very 
well  suited  for  the  implementation  of  a broad  class  of  digital 
filters,  both  transversal  and  recursive.  Procedures  exist 
for  implementing  transversal  filters,  fast  Fourier  transforms 
inipu  1 se-re.sponse  convolution  filters,  etc.,  with  various  LSI 
arrays.  (See,  for  example.  References  4 and  5.)  This  section 
presents  procedures  for  implementing  three  common  configurations 
of  recursive  linear  filters  with  CPA's. 

Section  9.3  preisents  a novel  concept  of  trainable,  nonlinear- 
filters.  In  many  signal  recognition  and  classification  problt'"'  , 
the  key  waveform  features  are  nonlinear  propertie.s  of  the  signal 
frequency  spectrum.  A complicating  factor  is  that  the  nonliivur- 
property  maV  not  be  known  a priori  and  must  be  "learned”  f rom 
empirical  data.  Adaptive  learning  network  ( ALN ) procedures  may 
be  used  to  synthesize  an  appropriate  nonlinear  filter  from  a 
data  base.  In  turn,  these  filters  are  easily  implemented  by 
CPA'.s.  By  way  of  example,  an  ALN  is  trained  to  estimate  a parame- 
ter which  is  known  (to  us)  to  be  the  energy  within  a certain  fre- 
quency band  of  a signal.  The  resulting  ALN  algorithm  is  com- 
pared to  a conventional  frequency-band  energy  estimator  to  show 
that  the  accuracies  and  computational  costs  of  trained  ALN  models 
are  comparable  to  a priori  conventional  implementations  oven  ii 
the  actual  transformations  are  known. 


In  summary,  configurable  polynomial  networks  are  appropriate  tor 
the  implementation  of  conventional,  linear  filters  (both  transversal 
and  recursive)  but  are  also  capable  of  implementing  more  general 
nonlinear  filters.  Adaptive  learning  network  procedures  may  l)e 
used  to  synthesize  the  mathematical  forms  of  the  nonlinear  fil- 
ters from  empirical  data. 
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9.2 


reOGR.\2M.\HLr:  POLY.NOMIAL  NETWORK  IMPLEMENTATIONS  OF 
Rl'CURSlVE  DIGITAL  LINEAR  FI LTERS  ‘ " 


R.J.l  CD.NVl.N'l  1 ONAI,  S T RUC'I'U  R i;S 


D i re c t Fo rm  --  recursive  linear  digital  filter.s  may  be  written 
as  transfer  functions  in  the  z-domain ; 


H(z) 


+ a-z  +...+  a„z 
_ y ( z. ) ^ __o 1 N 


-N 


x(z) 


1 + bjZ  ^ + 


. b z 


-N 


(9.3) 


a r 


b(z) 


N 


E 

n=0 
N ■ 


-n 

a z 
n 


V b z~ 
( n 


(9.2) 


wlu're  z ^ is  the  delay  operation,  N is  the  order  of  the  transfer 

function,  and  a and  b are  constant  coefficients.  This  form 
n n 

of  thc'  filter  equation  is  called  the  direct  form.  Hardw'are  or 
software  may  be  constructed,  as  shown  in  Figure  49  to  imple- 
ment this  direct  form. 


In  many  applications  it  is  desirable  to  implement  the  transfer 
functions  in  different  forms  so  as  to  improve  numerical  accuracy 
and/or  computational  stability  margin,  which  may  be  troublesome 
due  to  finite  word  lengths  in  digital  computers.  Two  prominent 
alternate  forms  are  the  parallel  and  orthogonal  polynomial  forms 
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Parallel  Form  --  With  the  parallel  form,  Equation  9.2  is  factored, 
by  partial  fractions,  into  a parallel  sum  of  linear  and  quadratic 
transfer  functions: 
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where  the  total  decree  N of  the  characteristic  polynomial  is, 
of  course,  the  same  for  both  the  direct  and  parallel  forms: 

M + 2L  = N (9.4) 

The  standard  construction  for  the  parallel  form  is  shown  in 
Figure  50 . 


Orthogonal  Polynomial  Form  --  In  the  orthogonal  polynomial  form, 
characteristics  (denominators)  of  the  transfer  functions  are 
written  as  a set  of  N equation  pairs.  For  example,  the  normal- 
ized  orthogonal  polynomial  form  may  be  described  by: 


-1 


+ + - k u , z 

“ n-1  = n " 


= k„u^  + c u ,z~^ 
u ^ n n n n-1 


n = 1,N 


and 


X 


u 

o 


(9.. “la) 


(9.5b) 


142 


where  the  k's  and  c's 

u are  into rme d i u t e 
n 

the  transfer  function 


are  constant  coefficients  and  the  and 

n 

variable  pair.s.  The  numerator  portion  of 
is  Riven  by  the  polynomial : 
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These  equations  (9.5. a,  9.5.h,  and  9.5.c)  give  rise  to  the  ladder 
structure  shown  in  F-igurc  SI. 


9.2.2  Polynomial  Network  Structures 


All  polynomial  representations  of  linear  filters  may  be  imple- 
mented in  configurable  polynominal  networks.  Not  structures  for 
the  direct,  parallel,  and  orthogonal  polynomial  forms  of  Equations 
9.2,  9.3.,  and  9.5  are  shown  in  Figures  52,  53  and  54,  respec- 
tively. All  calculations  can  be  performed  with  two  configurations 
of  eleriK*nts,  here  called  P5  and  P6 . Both  P5  and  PG  are  four-input 
elements;  however,  P5  has  two  outputs  and  P6  has  only  one 


P5: 


P6: 


WfXf 

+ 

"'2^2 

/Vo 

= w„x„ 

+ 

w ..  X 

3 3 

4 4 

y 

= w^x^ 

+ 

"'2^^2 

4 4 


(9.6) 

(9.7) 


where:  the  y’s  are  the  element  outputs,  x's  are  the  inputs,  and 

w’s  are  constant  coefficients. 


— ^P6  is  obtained  from  y 

by  setting  w =0. 

o 


IB 


(Sections  1 and  2 of 


this  report) 
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FICIIRH  50:  LINKAH  DIGITAL  FILTER,  PARALLEL  FORM,  STANDARD 

CONSTRUCTION 
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LINEAR  DIGITAL  FILTER*  ORTHOGONAL  POLYNOMIAL  FORM,  STANDARD  CONSTRUCTION 


1)L;N0M  1 NATOR  CALCULA’l' IONS 


FIGURE  53: 


LINEAR  DIGITAL  FILTER.  PARALLEL  FORM.  POLYNOMIAL  NET- 
WORK CONSTRUCTION 


LAUDKH  CALCULATIONS 


FIGURE  54:  LINEAR  DIGITAL  FILTER,  ORTHOGONAL  POLYNOMIAL 

FORM,  POLYNOMIAL  NETWORK  CONSTRUCTION 

I 

) 
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Number  of  Elements  Required  for  Each  Form  --  The  number  of  ele- 
ments, E,  required  for  an  N-pole  filter  is  ^;eri(>ra1]y  as  follows: 

Direct  Form: 

Parallel  Form: 

Orthogonal  Polynomial  Form 

For  the  direct  form,  an  (N  + l)-term  series  must  be  computed  for 
the  numerator  and  for  the  denominator.  Since  four  term.s  are 
summed  per  element  (P6),  each  summation  requires  (N+l)/l  elements, 
so  the  total  element  count  is  (N+l)/2. 

For  the  parallel  form,  all  characteristics  are  either  single  or 
double  pole.  Single-pole  characteristics  require  a two-term 
summation,  so  two  characteristics  may  be  computed  in  a single 
P5  element.  Double-pole  characteri-stics  requiis  a three-term 
summat ion,  whi ch  subsumes  a full  PC  element.  Thus  one  element 
implements  two  poles,  whether  single  or  double,  and  N/2  elements 
are  required  to  calculate  the  characteristics  portion  of  the  trans- 
fer function.  The  numerator  calculations  involve  a two-term 
summation  for  single  poles  and  a three-term  summation  for  double 
poles,  thus  requiring  a maximum  of  N/2  elements  per  pole.  The 
total  element  count  is  N. 

For  the  orthogonal  polynomial  form,  a full  element  (P5)  is 
required  for  each  pole,  or  for  each  "rung"  of  the  ladder.  The 
numerator  calculation  involves  an  (N+1)  term  series,  so  (N+l)/d 
P6  elements  are  needed.  Thus  the  total  element  count  is 
N + (N+l)/4. 

Number  of  Layers  Required  for  Each  Form  --  Both  the  direct  and 
parallel  form.s  may  be  implemented  in  two- layered  networks  of 
elements,  but  the  orthogonal  polynomial  form  requires  an  (N+1)- 
layered  network.  For  the  characteristics  computation  of  the 
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direct  and  parallel  forms,  the  output  of  each  clement  does  not  depend 
upon  the  output,  within  a given  iteration  of  the  filter,  from  any 
otlu'i'  element.  Thu.s  all  characteristic;  calculations  may  be  done 
in  parallel.  With  the  orthogonal  polynomial  form,  however,  the 
input  to  each  rung  of  the  ladder  includes  the  output  of  the  prior 
rung.  Thus  the  rungs  must  be  computed  in  series. 

the  pendent  paths  were  shown  in  dotted  lines  in  figures  52,  5.5 
and  54.)  In  all  three  forms,  the  numerator  calculation  is 
dependent  upon  the  characteristic  calculations,  so  one  additional 
layer  is  necessary. 

The  minimum  number  of  layers  L reejuired  for  each  form  is  thus: 

Direct  Form:  L,  = 2 

d 

Parallel  Form:  L = 2 

P 

Orthogonal  Polynomial  Form:  = N + 1 

The  maximum  speed  of  computation,  given  a fully  parallel  network, 
is  limited  by  L times  the  speed  of  one  element  computation. 
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9.3  TRAINABLE  NONLINEAR  FILTERS 


CONCEPT 

In  Convent ional  filter  design  there  are  generally  two  underlying 
assumptions.  First,  it  is  assumed  that  the  precise  nature  of  the 
operation  to  be  performed  on  a signal. is  known.  Second,  the 
signal  conditioning  is  generally  a linear  operation.  (There  are 
certain  notable  exceptions  such  as  waveform  hard-limiting  in  FM 
demodulation;  but  even  in  this  example,  it  is  the  zero-crossing 
timing  which  is  of  interest  --  no  information  is  being  extracted 
from  the  signal  in  the  limiting  process.)  If  the  nature  of  the 
transfer  function  is  known  and  linear,  there  are  many  straight- 
forward approaches  to  designing  appropriate  filters. 

However,  in  certain  interesting  classes  of  signal  processing 
applications,  the  description  of  the  desired  filter  may  not  be 
known,  or  If  it  is  known,  it  may  be  nonlinear  and  exhibit  insta- 
bility when  implemented  directly.  If  there  are  empirical  data 
which  represent  the  cla.ss  of  signals  to  be  operated  on  in  such 
cases,  adaptive  learning  network  (ALN)  procedures  may  be  used 
to  synthesize  filter  forms  which  estimate  or  predict  the  desired 
signal  parameter.  (See  the  Appendices  for  discussion  of  general 
ALN  methods. ) The  filters  are  transversal  and  are  therefore  always 
stable.  Questions  of  stability  associated  with  recursive  filters 
are  avoided. 

The  remaining  sections  illustrate  a procedure  for  using  ALN 
training  techniques  to  synthesize  a nonlinear  filter  from 
empirical  data.  The  example  used  involves  estimating  the  energy 
within  a specified  band  of  a broad-band  signal.  Though  the  design 
of  a conventional  energy  estimator  is  straightforward,  this 
knowledge  was  not  used  in  the  ALN  training  process  --  an  important 
consideration.  An  after-the-fact  comparison  is  made,  however, 
between  the  ALN  and  conventional  systems. 
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9.3.2  DATA  BASE  SYNTHESIS 


1. 


The  data  base  used  for  designing  and  for  subsequent  testing  of 
the  ALN  was  synthesized  in  the  following  manner.  The  ALN  was 
trained  to  mimic  the  characteristics  of  an  eight-pole  Butterworth 
filter  whose  3db  cutoff  frequencies,  and  were  equal  to 
1 and  2,  respectively.  A normalized  frequency  range  of  0 to  10 
was  used.  (So,  for  example,  a 30  kHz  signal  could  be  mapped 
into  this  range  by  interpreting  it  as  divided  by  3 kHz. ) 

The  frequency  response  of  an  ideal  eight-pole  Butterworth  con- 
tinuous filter  is  shown  in  Figure  55). 

To  form  the  data  base,  three  sets  of  130  time  signals,  x.(t), 
j = 1,  ...,  130,  were  generated.  The  need  for  three  sets  is  due 
to  the  ALN  synthesis  and  testing  procedures.  Each  signal  (of 
each  set)  was  constructed  as  a weighted  sum  of  five  sinusoids; 


Xj(t) 


I 


E “ji  <“ji‘  * ♦ji>  * “j 

i=l 


(9.8) 


where  the  definitions  and  ranges  of  the  parameters  in  this 


equation 

are  as  follows: 

“ji 

= Bias  Term: 

Uniform 

Random 

-0.5 

to 

+0.5 

*ji 

= Amplitude: 

Uniform 

Random  : 

0 

to 

1 

•=  Phase 

Uniform 

Random 

0 

to 

2 

“ji 

= Frequency : 

Uniform 

Random  on  Log  Scale: 

0.2 

to 

10 

Uniform  random  variables  were  chosen  on  a log  w scale  so  that 
u = log  would  have  more  values  at  the  low  end  of  the  0 to  10 
range  and,  hence,  near  the  bandpass  region. 
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The  130  signals  (per  set)  wore  generated  from  Equation  9.8  with 
I given  as; 

Value  of  I Type  of  Signal  Generated  Number  of  Signals 


1 

Pure  Tone 

50 

2 

2 Component  Signals 

20 

3 

3 Component  Signals 

20 

4 

4 Component  Signals 

20 

5 

5 Component  Signals 

20 

130 


Examples  of  pure  and  mixed  tone  signals  are  shown  in  Figure  56. 
The  sampling  interval  was  chosen  to  be: 


T = 2it/40  = 0.15708  sec. 

s 


(9.9) 


which  gives  a radian  sampling  frequency  of  40. 


Each  x.(t; 
Thus  the  highest  frequency  of  10  was 


was  sampled  200  times, 

sampled  four  times  per  cycle.  One  hundred  percent  of  a cycle 

of  the  lowest  frequency,  0.2,  was  included  in  the  sample  window. 

(Note  that  t = iT  in  discrete  form  in  Equation  9.8). 
s 


The  true  energy  in  the  passband  for  each  x.(t),  assuming  that  the 

J 

signal  continued  for  an  indefinite  period  of  time,  was  calculated 
as : 


1 

^ E 1 1 ^ 

i=l 

where,  |g(w..)|  is  the  magnitude  of  the  Butterworth  filter  for 
th  ^ 

the  frequency.  The  factor  of  i enters  Equation  9.10  because 

power  Is  one-half  of  the  amplitude  of  a sine  wave.  It  can  be 
shown  that  for  an  eight-pole  Butterworth  filter  the  amplitude  at 
frequency  w is  obtained  from; 
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U;  ( (*> ) r = 


^(Wj  cos.J^  )"  + (u)  -t 


(Wj  cosOj)  + (w 


. l^t: 

w ^ K i n 4>  j ^ j 

|^(Wj  coy|^j^  + (w  + Wj  sirK*.^)^  o)  j costj^)^  + (w 

cos^j)^  + (w  + Wj^  sini{ij)^j  |((ijj^  cnsjj)^  + (ui 

[2  2i  r 2 

Cu'i^  cob(f^)  + (w  + U)j^  sin«2)  I cos.?^)  + (w 


Uj  sjn4/^)  j 


o),  sin^i 
h 


“h 


(9.11) 


wlii'n  j = 1 , uijj  “ '■i , 

= 22.5"  . ;t,^  = 67.5"  . 

TMi  iKiiiiha.se  ii.sfMl  to  train  the  ALN , then,  consisted  of  two  of  tlie 
liirft;  sets  C)  f 130  wavc'forms.  For  each  waveform,  the  true  energy 
within  the  eigiit-pole  Butterworth  filter  passband  (1  < u>  < 2)  * 

is  Kiven  (from  liquation  9.10).  Therefore,  for  each  waveform, 
a 201-dimen.sional  component  vector  is  obtained;  200  samples 
of  tlie  signal  are  the  inputs  and  the  true  energy  is  the  output, 
or  dependent,  variable. 


9..1..1  PAM  RASl  I'Ak.AMliTf.K  IZ.ATION 


In  ordi'r  to  reduce  the  number  of  inputs  and,  potentially,  generate 
more  meaningful  input.s,  paranu'ters  of  the  200  samples  from  each 
wavi.'form  were  computed.  These  parameters  were  arbitrarily  chosen, 
and  ihfjre  is  no  reason  to  believe  that  they  are  the  best,  or 
even  a good  set.  The  definitions  of  the  31  parameters  are  given 
in  lahlc  12.  The  width  of  the  10  histogram  bins  were  arbitrarily 
chosen  a.s ; 0 to  0.01,  0.01  to  0.022,  0.022  to  0.047,  0.047  to 
0.10,  0.10  to  0 22,  0.22  to  0.47,  0.47  to  1.0,  1.0  to  2.2,  2.2 
to  4.7,  and  4.7  to  . 


I 
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lABU;  12 

WAVEFOKM  PARAMETERS 


Nunho  r 


Description 

200 


1 Mean  signal  value:  X;  = 200  2i.  x.(m) 

m=l  ^ 


Standard  deviation  of  signal  mean: 


^ X . = 

J 


■ j 200 

199  i 
m-1 


(x(ni)  - Xj)^ 


Maximum  deviation  from  signal  mean:  d^  = max|Xj(m)  - x^ | 

m=l  , . . . , 200 


4 Mean  magnitude  of  point-to-point  difference: 

j 200 

d^  = 199  i I x.(m)  - x.(m  - 1)| 
m-2  J ^ 


Standard  deviation  of  point-to-point  d i f f <?rences : 
, 200 


= 

J 


^ .2 


198  I (1  X ,(m)  - X (m  - 1)|  - d . ) 

m=2  J J J 


6-15  Number  of  hits  in  each  of  10  bins  of  the  point-to-point 

differences  histogram:  ]x,(m)  -x(m  -1)) 

J «J 


16-25  Number  of  hits  in  each  of  10  bins  of  the  signal  deviations 

about  moan  histogram:  |x.(m)  - x.| 

J «3 

26-31  Second,  third,  and  fourth  moments  of  the  two  above  histograms. 


l.'i? 


9.  ■^.4  Nl’lWORK  rRMMNC. 


I 

I 

I 

I 

I 

1 

.1 

1 

*1 


The  with  in-band  energies  the  130  piKnals  computed  from 
Equation  9.8  varied  over  a dynamic  ran^e  of  Kreater  than  12 
decades.  Since  the  network  training  proce.ss  .synthesizes  functions 
which  minimize  squared  errors^  and  it  is  desirable  in  tliis  example 
problem  to  predict  the  small  energies  (within  the  passband)  very 
accurately,  the  network  was  trained  to  predict  the  log  of  the 
energy  rather  than  the  energy  itself; 

yj  = 10  log^^e^  (9.12) 

In  the  training  process,  the  first  set  of  130  signals  was  used  to 
synthesize  the  network.  The  second  set  of  130  signals  was  used 
to  avoid  overfitting  the  polynomial  energy  estimation  model. 

The  third  set  of  130  signals  was  used  to  evaluate  the  resultant 
network  performance. 

A block  diagram  of  the  resultant  energy  estimation  system  is 
shown  in  Figure  57.  The  input  signal  s(t)  is  passed  through  a 
preprocessor  which  extracts  the  31  parameters.  This  information 
forms  an  input  vector  to  the  ALN^  which  then  estimates  the  log  of  the 
energy.  The  antilog  is  computed  to  yield  the  estimated  energy, 
e,  within  the  desired  passband. 

9.3.5  RESULTS 

The  results  of  the  network  training  are  presented  in  two  forms. 

First,  energy  estimates  for  the  signals  of  mixed  frequencies  were 
accurate  to  within  5 dB  on  the  average  over  a range  of  60 
dB.  These  accuracies  are  based  on  130  mixed  frequency  signals 
which  were  not  used  in  the  network  training  process. 
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Second,  energy  estimates  for  the  pure  tone  signals  were  estimated 
within  6 dO  on  the  average  over  a range  of  60  dB.  These 
accuracies  are  based  on  50  single  frequency  signals  which  were 
not  used  in  the  network  training  process. 


It  is  also  possible  to  plot  a Bode  diagram  of  the  ratio  of  total 
signal  energy  to  the  estimated  energy  within  the  band.  This  was 
done  by  constructing  single  frequency  sinusoids  ( I = 1 in 
in  liquation  9.18)  with  A^^-O.S,  and  K^-0.  The  frequency 

term  was  then  varied  slowly  from  0.2  to  10.  The  resultant  signal 
was  parameteriijed,  and  its  true  and  estimated  found  via 
Equation  9.11  and  the  ALN  function,  respectively.  Ideally,  the 
response  would  resemble  Figure  9.7,  except  that  the  decible  scale 
would  be  in  terms  of  power  rather  than  amplitude.  The  plot  for 
the  ALN  system  is  shown  in  Figure  9.10.  It  can  be  seen  in  this 
figure  that  the  ALN  energy  estimator  nas  in  fact  learned  the 
overall  trend,  but  that  it  is  not  as  accurate  as  desired  in 
the  passband  region. 


The  pair  of  curves  C Figure  58)  for  the  ALN  system  is  a measure 

of  the  pha.se  sensitivity  of  the  synthesized  function.  This 

was  estimated  by  choosing  six  equally-spaced  values  of  iji  in  the 

range  0 to  2v  for  each  value  of  w.  This  gave,  for  each  value 

of  0),  six  single-component  sinusoids  possessing  the  same  frequency 

but  with  different  phases.  Then,  six  values  of  the  estimated 

energy  were  obtained  for  eachui,  and  these  were  averaged.  The  mean 

energy  plus  and  minus  its  standard  deviation  are  actually  plotted  in 

Figure  58,  so  that,  for  each  w,  the  plotted  upper  value  is  e + 

and  the  lower  value  is  e - « . 

e 

Since  these  two  values  are  so  close  for  all  w , it  was  concluded 
that  the  trained  estimator  is  reasonably  phase  insensitive. 
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Upon  closer  examination  of  the  130  signals  used  to  train  the 
polynomial  function,  it  was  found  that  less  than  15  percent  of 
the  130  signals  had  any  appreciable  amplitude  in  the  passband 
region.  So,  on  the  assumption  that  the  performance  (Figure  9.10) 
was  due  mainly  to  lack  of  appropriate  training  data,  an  additional 
36  signals  were  generated  and  added  to  the  130  to  form  a new, 
augmented  training  set  of  166  signals.  The  36  new  signals  were 
all  single  sinusoids  (I  = 1 in  Equation  9.8)  with  the  following 
characteristics: 

K . =0  for  j = 1 , ....  36 

==  ° 

A.  = Random  uniform  between  0.3  and  0.7 
J 

(.0  = 0.5  + (j-l)O.l:  j = 1 36 

Thus,  the  36  new  signals  all  had  an  average  amplitude  of  0.5 
and  were  centered  around  the  passband  position  of  the  overall 
frequency  range. 

A new  ALN  was  found  from  this  training  set  and  its  performance 
is  given  in  F-igure  59.  Comparing  Figures  58  and  59  illus- 
trates the  dramatic  effect  of  creating  the  energy  estimator  with 
a more  appropriate  training  set.  The  assumption  that  the  ALN 
performance  was  dependent  mainly  on  the  availability  of  a repre- 
sentative data  base  is  confirmed.  The  error  was  reduced  from  5 
dll  to  4 for  mixed  frequencies  and  from  6 dB  to  3 for  pure  tones. 

It  is  of  interest  to  establish  the  effect  of  adding  still  more 
data  to  the  training  set,  but  this  was  not  done  in  the  present 
study . 

The  structure  of  the  polynomial  network  (i.e.  the  net  trained  with 
the  36  extra  signals)  is  shown  in  Figure  60.  The  number  of 
elements  used  is  26  and  the  number  of  coefficients  in  the  network 
is  146. 
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Magnitude 


FIGURE  59:  ENERGY  BAND  PASS  FREQUENCY  RESPONSE  FOR  THE  NEW  ALN  SYSTEM 


A block  diagram  of  a conventional  enei-gy  band  estimator  is  shown 
in  I'igure  a 1 for  comna  r a t i vc  piir|)o.si‘S  to  figure  (iH. 

'.>.,'.(1  DISCUSSION 

Tiiough  the  conventional  system  is  a more  accurate  estimator  of 
desired  waveform  property  — particularly  within  the  passband  — 
it  was  designed  with  a i^riori  knowledge  of  the  formula  for  com- 
puting the  energy  parameter.  The  network  system  vvas  designed 
by  a fully  automated  process  which  had  no  a priori  knowledge  f)f 
any  formula  for  the  parameter.  The  network  training  algorithm 
synthesi,zed  its  own  formula  from  empirical  data.  Tlie  mathematical 
cquati(jns  of  the  network  system  appear  to  lie  significantly  dif- 
ferent from  those  of  the  conventional  system,  but  the  results 
are  comp  arable. 

The  effect  of  the  ALN  filtering  action  can  bo  viewed  in  the  tin.e 
domain  in  the  following  way.  A signal  to  be  filtered  can  be  gen- 
erated from  Equation  9.8  as: 


I 

x(t)  = ^ A^  sinlu^t  + i{i^)  + 

i = l 

The  amplitude  factors,  A.,  should  properly  bo  written  in  the  time 
domain  a.s  A.(t).  In  the  freejuency  domain,  these  magnitudes  are  Ajf 

The  effect  of  the  ALN  filter  is  to  alter  these  magnitudes  as: 

a!(oj)  = A.(ca)  ! H(w)  1 

where,  | H(oj)  ( Is  the  magnitude  from  the  Bode  plot  of  the  ALN 
filter.  Ideally,  M(<j1  would  resemhlc  Figure  .SS  for  the  ta 
of  interest.  In  actuality,  these  values  are  obtained  from 
Figure  59. 


In  order  to  reconstruct  the  signal,  x(t),  and  to  observe  the 
filtering  effect,  the  time  domain- representation  of  A^(to)  are 
found  and; 

I 

x'(t)  = sinCw^t  + <t>^)  + 

i=l 

One  comparison  between  x(i,)  and  x'(t)  would  be  their  cross 
correlation . 
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9.4  CONCLUSIONS 


The  ALN  training  procedure  is  a very  powerful  tool  for  synthesizing 
a nonlinear  filter  to  estimate  a waveform  parameter  whose  pre- 
cise mathematical  definition  is  not  known  a priori  but  is 
inherently  represented  within  empirical  data. 

ALN  systems  have  the  advantage  that  they  fall  within  the  trans- 
versal filter  category  and  thus  do  not  have  round-off  accuracy 
and  numerical  stability  problems  often  associated  with  recursive 
filters  and  particularly  nonlinear  recursive  systems. 

Another  advantage  of  the  ALN  system  is  that  it  is  t rainable  via 
a flexible  software/hardware  configuration.  This  means  that  the 
same  hardware  can  be  configured  to  extract  signal  parameters 
and  to  solve  problems  that  arise  from  time  to  time.  Therefore, 
their  usefulness  is  not  limited  to  preprogrammed  and  structured 
algor  i thms'. 
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SliCTION  X 


NON-POLYNOMIAL  P RE P ROC LS SOR  EUNCTIONS 
10.1  INTRODUCTION 

The  present  element  circuitry  developed  by  RCA  under  Contract 
F33615-73-C-1089  is  capable  of  computing  the  following  mathe- 


mat ical 

functions : 

PI  : 

II 

P 

o 

+ 

afXi 

P2 : 

y = ag 

+ 

afXi  + a2X2  + a3X^X2 

P3: 

y = ao 

+ 

These  functions  may  be  used  as  building  blocks  for  many-variable, 
high  degree  polynomials  and  are  thus  suitable  for  the  computation 
of  polynomial  networks  and  for  many  preprocessing  functions. 
However  I there  are  several  types  of  preprocessing  functions  which 
are  typically  used  in  waveform  processing  applications  where 
polynomial  networks  are  used  but  which  cannot  be  computed  within 
the  present  element  design. 

It  is  desifable  to  augment  the  computing  capabilities  of  the 
present  element  so  that  all  preprocessing  and  network  computations 
may  be  performed  in  the  same  piece  of  hardware.  The  following: 
functions  are  often  used  in  signal  processing  applications: 

(a)  Counting  the  number  of  zero  crossings  in  a time  series. 

(b)  Countine  the  number  of  threshold  crossings  in  a time 
series . 

(c)  Finding  the  peak  value  of  a time  series. 

(d)  Rectification. 

(e)  Hard  limiting. 

(f)  Sign  detection. 

(g)  Variable  threshold  detection. 

(h)  Dual  threshold  detection. 

This  section  presents  preliminary  functional  designs,  circuit 
diagrams,  and  timing  diagrams  for  the  above  functions.  All  func- 
tions have  been  combined  into  a single  computational  unit  to  mini- 
mize total  circuitry.  The  functions  include  processing  on  two's 
complement,  24-bit  fixed  point  words. 
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The  objective  of  the  work  is  to  provide  a baseline  design  for  a 
nonpol ynomial  processor  which  would  ultimately  undergo  large 
scale  integration  for  use  along  with  the  polynomial  processor. 

The  circuitry  contains  approximately  3,100  equivalent  gates. 

It  is  estimated  that  approximately  2,000  additional  gates  would 
be  required  to  expand  the  unit  with  exponent  logic  to  provide  a 
full  floating  point  capability. 

10.2  FUNCTIONAL  DESCRIPTION 

Since  the  polynomial  element  is  designed  to  receive  eight  data 
inputs  simultaneously,  the  nonpolynomial  logic  is  configured  the 
same  way  to  provide  maximum  compatibility.  The  eight  inputs 
are  word  parallel,  bit  serial.  The  eight  input  words  are  stored 
in  a set  of  eight  twenty-five  bit  registers.  The  twenty-fifth 
bit,  at  the  leading  edge  of  the  word,  is  a repetition  of  the 
sign  bit  and  is  used  to  prevent  overflow  when  adding  two's  comple- 
ment twenty- four  bit  words. 

The  typical  computation  cycle  for  the  nonpolynomial  logic  consists 
of  a data  input  subcycle  where  the  eight  words  are  clocked  in 
in  parallel  with  a series  of  25  clocks,  and  an  output  subcycle 
where  the  selected  function  is  simultaneously  computed  and  clocked 
out  with  a series  of  " clocks.  The  computational  form  which 
implements  these  two  subcycles  is  shown  in  Fig.  62. 

In  .general , each  of  the  functions  of  the  nonpolynomial  processor 
requires  several  full  cycles  to  process  a waveform.  With  one 
exception  (rectification),  the  first  cycle  on  any  time  series 
analysis  is  an  initialization  cycle  and  consists  of  loading  the 
analysis  parameters,  e.g.,  thresholds  or  limiting  values.  The 
subsequent  cycles  involve  the  loading  and  processing  of  the  time 
series  data. 
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There  are  five  rcRistcrs  for  the  storat-C*-  of  analysis  parameters. 
Some  functions,  for  example  the  zerf)  crossinp  counter,  require  that 
cumulative  result.s  be  maintained  over  several  "cycles"  of  process- 
ing. There  are  nine  registers  for  intermediate  results  storage. 
Thus,  there  are  a total  of  twenty-two  data  storage  registers  -- 
eight  for  time  series  data,  five  for  analysis  parameters,  and  nine 
for  intermediate  results. 


For  consistency  of  op<;ration,  a common  set  of  clock.s  is  us(?d  for 
all  functions  and  is  used  on  both  initialization  and  processing 
cycles . 

10.  :i  LOGIC  UF,SCRIPTIO.\ 


The  logic  diagrams  for  the  nonpolynomial  processor  are  shown  in 
figures  64  through  70  (at  the  end  of  this  soctionl.  Descriptions 
of  the  logic  for  the  individual  functions  are  presented  in  this  sectio.. 


A timing  diagram  for  the  clocks  is  shown  in  1 i g . 6.1.  Definitions 
for  each  of  the  clocks  are  as  follows: 


C 


in 


'sb . 
in 


C.  ’ 
in 


C.  " 
In 


"out 


«^out 

Clear 


24  pulse  train  to  clock  data  into  the  processor. 
(The  input  data  are  stored  in  the  data  input 
registers  on  Figure  3.) 

Single  pulse  on  input  subcycle  which  clocks  the 
sign  bit  of  the  incoming  data  into  sign  bit 
storage  flip-flops  (see  Figure  4).  Occurs 
simultaneously  with  24^^  pulse. 

The  25fh  clock  on  the  input  subcycle;  Repeats 

the  sign  bit;  Occurs  immediately  after  the  24^"  C. 

pulse. 

Occurs  immediately  after  C.  ' . Is  a single  pulse. 
(Not  used).  ” 

2,4  pulse  train  to  clock  out  result  of  nonpoly- 
nomial processor.  Occurs  after 

Clocks  the  sign  bit  into  intermc  <iiate  storage. 
Occurs  simultaneously  with  24^h  c pulse. 

Clears  logic  for  next  cycle.  Occurs  Immediately 

prior  to  C,  . 

in 
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The  eight  functions  specified  in  Section  1 may  i>e  imp  1 (•mc'iited 
with  five  logic  groups.  Counting  the  number  of  zero  crossings 
is  done  with  the  threshold  crossing  count  logic  by  setting  the 
threshold  to  zero.  Sign  detection  and  threshold  detection  are 
performed  with  the  dual  threshold  detection  by  setting  both 
thresholds  to  the  same  value  to  obtain  thre.shold  detection,  and 
by  setting  both  the  thresholds  to  zero  to  obtain  sign  detection. 
The  five  different  functions  are  specified  by  a three-bit  control 
word  as  follows: 


Funct ion 
Select  Lines 

Funct i on 

Schemat i c 
Designate  ons 

000 

Count  threshold  crossings 

FTC 

001 

Find  peak  value 

FYMAX 

010 

Recti f icat i oh 

FRECT 

on 

Hard  limiting 

FHL 

100 

Dual  threshold  detection 

FDTD 

Ihe  logic  controlling  the  function  select  is  shown  in  Figure  70. 

lO..'?.!  COUNTINCl  Tin;  NUMBLR  OF  THRKSliOLD  CROSSINGS  IN  A TIMK  SKRIFS 

Two  types  of  input  data  are  required  to  compute  the  number  of 
threshold  crossings  in  a time  series:  the  threshold  value  and 

the  time  series  samples.  The  threshold  is  input  the  processor 
on  the  initial  cycle,  and  the  series  data  is  input  .sequentially, 
in  groups  of  eight,  on  the  subsequent  cycles. 

The  initialize  cycle  is  so  noted  by  setting  the  initialize  line 
high  during  the  input  subcycle.  The  threshold  is  input  on  the 
first  of  the  eight  data  input  linos  and  the  initialize  line  will 
cause  it  to  be  stored  in  the  first  parameter  storage  register  * 
shown  in  Figure  64.  The  cumulative  crossing  register  shown  on  the 
lower  right  portion  of  Figure  66  is  -Iso  sot  to  zero  on  the  initial- 
ize cycle. 
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On  the  second  cycle,  the  first  eight  time  samples  are  input  to  the 
eight  data  lines.  As  shown  in  Figure  '■  r (left  side),  the  t h i-  e s ti  o 1 d 
is  subtracted  from  each  of  the  eight  data  word.s.  This  computation 
occurs  on  the  input  subcycle.  The  signs  of  the  resultant  subtrac- 
tions, which  indicate  whether  each  signal  is  above  or  below  the 
threshold,  are  stored  in  a set  of  eight  flip-flops. 

As  shown  in  Figure  66,  tliresliold  crossings  are  detected  by  a series 
of  e,\clusive-or  gates  which  detect,  between  one  data  point  and  the 
next,  a change  in  the  sign  bit.  If  the  sign  bit  between  two  suc- 
cessive data  points  does  change,  it  indicates  that  the  signal 
crossed  the  threshold  between  those  two  points  and  the  crossing 
count  should  be  incremented. 

Four  three-bit  adders  are  used  to  count  up  to  seven  crossings 
between  the  eight  data  points.  This  sum  is  added  to  the  cumulative 
threshold  crossing  register  on  the  output  subcycle.  The  new 
cumulative  is  passed  to  the  processor  output  logic  for  use  when 
the  time  series  is  complete  and  is  also  jiassed  back  into  the  cumu- 
lative register  for  use  on  the  next  cycle*  if  the  series  is  not 
complete.  Intermediate  sums  may  be  read  out  if  so  desired  by  the 
host  processor. 

Note  that  although  time  series  data  points  1 through  8 were  input 
on  the  first  data  cycle,  points  8 through  15  must  be  input  on  the 
subsequent  cycle.  The  eighth  point  from  one  cycle  must  be  repeated 
as  the  first  on  the  next  cycle  so  that  a crossing  between  that 
point  and  the  next  one  in  the  time  series  may  be  detected. 

10.3.2  FINDING  THh  PhAK  VAIUIi  OF  A TIMF  SFRIFS 

Like  the  threshold  crossing  counter,  the  peak  value  detector 
operates  on  sequential  portions  of  the  time  series.  However,  the 
data  sequence  may  bo  modified.  For  the  peak  value  detector,  the 
first  eight  samples,  x^  through  x^,  are  input  to  the  nonpolynomial 
logic  on  the  first  cycle,  Xg  through  x^g  on  the  second  cycle  and 
so  forth. 
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The  peak  value  register,  at  the  right  side  of  Figure  66  is  set 

23 

to  the  smallest  possible  negative  number  (-2  ) on  the  initialize 

cycle . 

The  peak  value  function  operates  in  a pipe-line  decision-tree  fashion 

where  there  are  four  levels  of  decision  making.  In  the  first  level, 

four  pairs  of  data  are  compared  to  find  four  pairwise  maxima. 

In  the  second  level,  two  four-way  maxima  are  found,  and  in  the 

third  level,  the  eight-way  maximum  is  found.  The  fourth  decision 

level  involves  the  comparison  of  the  eight-way  maximum  and  the 

prior  time  series  maximum  (peak  value,  x , register). 

max 

Though  there  are  four  decision  levels,  the  decision  functions 
overlap  in  computation  cycles.  While  the  eight  input  data  words 
are  being  input  to  the  eight  buffer  registers.  Figure  64,  they 
are  simultaneously  input  to  four  pairwise  comparitors.  Figure  66, 
and  being  compared.  On  the  computation  cycle  (the  comparison 
having  actually  been  made  on  the  input  subcycle),  the  higher  of 
each  pairwise  data  is  selected  for  transmission  to  the  second 
dec ision  1 evel . 

The  second  level  is  designed  the  same  as  the  first;  however,  the 
input  cycle  to  the  second  level  corresponds  (in  time)  to  the 
output  cycle  of  the  first  level.  Similarly,  the  third  level  input 
cycle  corresponds  to  the  second  level  output  cycle,  and  the  fourth 
level  input  cycle  corresponds  to  the  third  level  output  cycle. 

fh*  n*-f  dfl.ay  f rf)m  the  eight  data  word  input  to  the  fourth  level 
'1*  ■ IS  t*'p  and  one-half  cycles.  Thus  after  the  original  time 

••  ••  two  additional  input-output  cycles  must  be 

I . < r.  I iK'  final  output  result  has  bo('n  processed  through 


10.3.3  RECTIFICATION 


The  rectification  function  simply  inverts  negative  numbers  to 
make  them  positive.  Eight  data  words  are  processed  in  parallel 
on  each  cycle.  No  initialization  cycle  is  required. 

On  the  input  subcycle,  the  sign  bits  of  the  data  are  stored  in  the 
flip-flops  shown  on  Figure  65.  The  data  outputs  are  controlled 
by  these  sign  bits.  If  the  sign  is  positive  (0),  the  data  is 
clocked  out  from  the  input  register  as  it  was  input.  If  the  sign 
was  negative  (1),  all  the  bits  are  inverted  and  serially  added 
to  00... 01,  to  give  the  rectified  value. 

10.3.4  HARO  LIMITING 

The  hard  limiting  function  takes  a time  series  input  and  produces 
a time  series  output  which  is  equal  to  the  input  unless  the  input 
value  exceeds  an  upper  threshold,  TU , or  is  less  than  a lower 
threshold,  'TL. 

On  the  initialization  cycle,  the  limiting  values  are  placed  in 
corresponding  parameter  storage  registers ,. Figure  62. 

TU  and  TL  are  gated  from  the  first  and  second  data  inputs 
respectively . 

Successive  cycles  produce  the  hard  limited  values  from  the  time 
series  data.  The  time  series  data  are  stored  in  the  data  input 
registers,  Figure  62,  and  are  simultaneously  subtracted  from  the 
upper  and  lower  limits,  as  shown  in  Figure  65.  The  sign  bits  of  the 
resultant  subtractions  are  clocked  into  flip-flops.  Figure  65, 
and  are  used  to  control  the  data  output.  Figure  68.  If  xj  is 
greater  than  the  upper  limit,  the  sign  of  x^-TU  will  be  positive 
(sign  bit  equal  to  0),  and  the  upper  limit  value  TU  will  be 
clocked  out  as  y^.  If  x^  is  less  than  the  lower  limit,  the  output 
will  be  equal  to  the  lower  limit  TL.  If  neither  of  those 
conditions  exist,  x^  is  between  the  limits,  and  y^  will  be  equal  to 
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10.3.5  DUAL  THRE.SHOLD  DETECTION 

In  dual  threshold  detection,  the  output  is  A,  B,  or  C, 
depending  upon  whether  the  input  is  above  an  upper  threshold, 
between  the  upper  and  lower  thresholds,  or  below  the  lower  thresh- 
old. The  values  of  A,  B,  C,  (typically  (+1,  0,  -1)  are  user 
specified  on  the  initialization  cycle,  along  with  the  thresholds 
TU  and  TL . TU  and  TL  are  put  in  via  the  first  two  data  inputs, 
respectively,  and  A,  B,  and  C are  put  in  through  inputs  3,  4,  and  5. 

The  time  series  data  are  input  in  groups  of  eight  on  successive 
cycles.  As  shown  in  Figure  65,  TU  and  TL  are  simultaneously  sub- 
tracted from  each  of  the  data  words,  and  the  resulting  sign  bits 
indicate  whether  the  data  is  above  the  threshold  (sign  bit 
positive,  0)  or  below  (sign  bit  negative,  1).  The  subtraction 
and  sign  bit  storage  are  accomplished  on  the  input  subcycle. 

The  values  A,  B,  or  C are  clocked  out  on  the  output  cycle  accord- 
ing  to  the  resulting  subtraction  sign  bits.  If  is  greater  than 
TU,  A is  clocked  out.  If  x^  is  less  than  TL,  C is  clocked 
out.  If  neither  of  those  conditions  exist,  x^  is  between  the 
thresholds  and  B is  clocked  out. 

10.3.6  SIMULTANEOUS  OPERATION  OF  MULTIPLE  FUNCTIONS 

The  way  the  logic  is  configured,  it  is  possible  to  perform  some 
of  the  functions  simultaneously.  The  data  from  a time  series 
may  be  processed  once  and  the  following  combination  of  functions 
may  be  computed  in  parallel: 

• Count  of  threshold  crossings 
o Find  the  peak  value 

• One  of  following  three: 

Rectification 
Hard  limiting 
Dual  thresholding 
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This  simultaneous  processing  is  possible  because  separate  logic 
is  used  to  compute  the  different  functions,  and  because  the  threshold- 
crossing-count and  peak-value  data  need  not  be  output  until  after 
the  whole  time  waveform  has  been  processed. 

As  the  waveform  is  being  processed,  the  rectified,  limited,  or 
thresholded  time  waveform  may  be  output  by  appropriate  control 
of  the  function  select  lines.  After  the  waveform  has  been  pro- 
cessed, the  function  select  lines  may  be  modified  on  the  subse- 
quent cycle  to  obtain  the  threshold  crossings.  On  the  third  cycle 
after  the  processing,  the  waveform  peak  value  will  be  available. 


It  must  be  noted,  however,  when  performing  multiple  functions 
simultaneously,  that  the  input  data  sequence  be  properly  handled. 

In  the  above  example,  the  data  sequence  is  dictated  by  the  threshold 
crossing  protocol,  where  the  last  data  point  on  one  cycle  is 
duplicated  on  the  first  point  on  the  next.  This  will  not  impact 
the  results  of  the  peak  detection  logic,  but  care  must  be  taken 
in  handling  the  output  waveform  because  of  the  data  duplication. 


Significant  time  savings  may  be  achieved  by  simultaneous  computa- 
tion of  the  functions. 
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SUMMARY  AND  CONCLUSIONS 

Methods  for  implementing  elements  osing  available  LSI  technology  and  capable 
of  efficient  realization  of  a family  of  multinomial  expressions  have  been 
explored.  These  elements,  configurable  in  arrays,  represent  a new  means  for 
the  hardware  solution  to  a variety  of  signal  processing  problems.  Relieving 
software  requirements  and  providing  for  a wide  range  of  high  speed,  real  time 
solutions,  the  resultant  Configurable  Polynomial  Array  (CPA)  is  applicable  to 
both  conventional  known  linear  transformations  and  processing  (such  as  the  FFT) 
as  well  as  non-linear  transformations  derived  by  training  algorithms  for  the 
realizations  of  Adaptive  Learning  Netoworks  (ALN's).  These  CPA's  then  represent 
a means  for  performing  all  of  the  critical  processing  functions  associated  with 
classification,  discrimination,  modeling,  identification  or  recognition  typo 
problems.  In  the  performance  of  such  functions,  the  CPA  is  much  faster  than 
programmed,  serial  computers  while  relieving  software  problems  associated  with 
complex  signal  processing. 

Alternative  architectural  approaches  for  implementing  elements  were  shown  and 
compared  as  a function  of  calculating  precisions.  These  approaches  were  basically 
character! zed  by  whether  they  used  all-parallel  or  s eri  al -paral  lei  multipliers, 
their  control  structure  (hard-wired  or  micro-processor),  and  whether  they  were 
to  be  used  in  a multiplexed  or  fully  populated  array  mode. 

Supporting  precision  requirements  investigation  and  analysis  showed  (by  theory 
and  simulations)  the  application  areas  of  relatively  small  word  lengths  (12  to 
16  bits,  fixed  point)  up  to  32-bit  floating  point  notation. 
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To  realize  one  of  the  more  flexible  and  efficient  element  approaches,  a 

i.’4-bit  serial-paral  lel  CMOS/SOS  multiplier  was  designed  and  fabricated 

for  use  in  the  mantissa  processor  of  an  element  capable  of  evaluation  of 

any  one  of  a family  of  5 multinomial  expressions,  tight  of  these  circuits, 
2 

along  with  other  I L and  MOb  parts  and  working  in  conjunction  with  an 
8-bit  floating  point  pro,,essor  were  assembled  into  a brassboard  for 
demonstration  of  the  CPA  element  concept.  Related  software  and  hardv/are, 
for  integrating  this  brassboard,  with  an  HP21MX  was  also  developed. 


The  major  conclusions  of  this  project  are: 

1)  CPA's  are  very  well  suited  to  signal  and  image  processing 
applications.  Because  of  their  electronic  configurability, 
they  can  perrorm  nearly  all  of  the  bulk  arithmetic  opera- 
tions needed  in  such  processing,  and  when  used  in  distributed- 
function  netv/orks  they  can  provide  extremely  fast  solutions 
that  would  require  much  more  time  using  "general  purpose," 

i.  e.,  predominantly  serial,  computations. 

2)  The  principal  functions  for  which  CPA's  should  be  config- 
urable are  primarily  5 expressions  (Section  II,  Table  1). 

3)  Three  nominal  CPA  architectures  have  emerged  as  being  the 
most  effective  and  mutually  complementary  for  realization 
of  CPA  functions  (Section  VI) 

a)  A fast,  very  precise  CPA  having  b - 10  us.  throughput 
and  using  fixed-point  arithmetic  with  24-bit  mantissas, 
as  called  for  by  many  real-time  signal  processing  tasks. 
(A  second  version  of  the  CPA  would  incorporate  floating- 
point arithmetic  capability).  The  precision  of  this 
CPA  is  suitable  for  networks  having  five  to  ten  layers 
of  parallel  distributed  functions. 

b)  A very  fast,  moderately  precise  CPA  having  1 - 2 us. 
throughput  and  using  fixed-point  arithmetic  with  16-bit 
mantissas.  This  precision  is  suitable  for  networks 
having  less  than  five  layers  of  parallel  functions 

c)  A minimum  power  consumption,  low  precision  CPA  having 
voltage  control  of  throughput  time  (to  conserve  energy) 
between  10  us.  and  1 ms.  and  using  fixed-point  arith- 
metic with  eight-bit  mantissas.  This  precision  is 
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suitable  tor  networks  fiaviny  a sirigle  layer  of  parallel 
functions,  if  an  output  accuracy  of  four  bits  is  sufficient. 

This  CPA  also  may  be  used  in  networks  having  two  (and 
perhaps  up  to  four)  layers  if  an  output  accuracy  of  one 
bit  is  acceptable,  as  in  pairwise-voting  discriminant 
functions  for  signal  classifiers. 

A signal  processing  concept  in  which  a configurable  polynomial  building 
block  can  be  considered  as  a "macro"  has  been  demonstrated.  The  availability 
of  a family  of  such  blocks  and  their  use  in  arrays  will  provide  improved 
tools  for  a variety  of  Avionics  signal  processing  requirements.  The 
realizations  of  true  parallel  array  processing  embodying  distributed  logic- 
memory  elements  will  provide  improved  problem  solving  capability  in  conventional 
as  well  as  adaptive  transformation  networks. 

Future  tasks  can  now  confidently  be  addressed  in  the  areas  of  multiple 
element  realizations  under  a common  processor  controller  to  demonstrate 
previously  unattainable  throughputs  together  with  ease  of  software  and 
hardware  mechanization  for  many  important  signal  processing  problems. 
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Abst  ract 

Major  aspects  of  the  theory  and  app 1 i - 
cation  of  cybernetic  systems  are  sun«narized 
in  this  paper.  The  principal  topics  dis- 
cussed are:  methods  for  synthesis  of 

trainable  networks  to  model  relationships 
in  their  environment,  se 1 f-organ i z i ng  con- 
trol techniques,  guided  random  search 
(optimization)  procedures  that  adjust  pa- 
rameters in  adaptive  systems,  hardware 
trends,  and  results  of  representative 
applications.  An  attempt  is  made  to  pre- 
sent past  and  current  developments  in 
their  historical  context. 


1 . I nt  roduct i on 

Cybernetic  systems  are  goal  directed, 
employing  information  feedback  to  assess 
the  degree  of  goal  attainment.  They  are 
u I t ras  tab  I e processes,  that  is,  these 
systems  become  adapted  to  a problem  envi- 
ronment by  acting  selectively  toward  re- 
sponses of  important  variables  observed  as 
internal  parameters  of  the  system  are 
changed.  l8,35] 

Realization  of  major  cybernetic  sys- 
tems involves  a number  of  important  subject 
areas,  chief  among  them  being  the  modeling, 
control,  and  adaptation  of  complex  pro- 
cesses. Traditionally,  in  theoretical 
science  and  engineering,  these  topics  have 
been  dealt  with  by  reasoning  directly  from 
underlying  physical  principles,  deriving 
equations  that  express  the  analytical  con- 
clusions. These  conclusions  have  then 
been  compared  with  experimental  findings, 
and  where  agreement  is  lacking  there  has 
been  a stimulus  for  new  advances  in  the 
theory.  It  is  often  taught  that  this 
"scientific  method"  holds  the  key  to  solu- 
tion of  most  technological  problems. 

However,  it  is  being  found  that  com- 
plex processes  are  not  entirely  tractable 
when  approached  in  the  traditional  way. 

It  need  not  be  argued  that  the  "scientific 
method" is  wrong,  but  only  that  its  domain 
of  employment  has  been  too  narrow.  Tech- 
nologists have  mistakenly  assumed  that 
"being  scientific"  meant  achieving  closed- 
form  analytical  representations  for  all 
processes  of  interest.  This  may  be  true 


in  pure  science,  but  in  most  other  endeav- 
ors the  main  object  is  to  create  systems 
that  work.  And  in  these  endeavors,  the 
old  luxury  of  treating  analytically  trac- 
table problems  can  lead  to  failure. 

Neither  is  it  entirely  safe  to  rely 
on  laboratory  experimentation.  The  arti- 
ficial environment  does  not  always  repli- 
cate the  complexities  of  the  real  world. 

Faced  with  the  challenge  of  modeling, 
control,  and  adaptation  of  complex  pro- 
cesses, it  seems  that  our  paradigm  must  be 
more  than  a method  contrived  by  man's 
brain,  it  must  be  the  operation  of  the 
brain  (or  central  nervous  system)  itself. 
The  brain,  one  of  Nature’s  more  remarkable 
designs,  is  an  existence  proof  that  ultra- 
stability can  work  in  complex  processes. 

Cybernetics  is  the  science  of  how  the 
brain  performs  and  how  its  workings  can  be 
realized  in  the  inanimate  parts  of  systems. 

The  theory  of  cybernetic  systems  is 
still  fragmentary,  although  it  is  gradu- 
ally being  placed  on  a rigorous  footing, 
particularly  by  Ivakhnenko,  Rastrigin,  and 
other  investigators  in  the  Soviet  Union. 
Even  though  the  functioning  of  the  brain 
and  procedures  for  copying  this  function- 
ing in  physical  systems  are  only  partially 
perceived,  there  is  presently  very  rapid 
progress  in  applications  work.  Many  of 
the  things  being  done  today  were  only 
talked  about  ten  years  ago.  Cybernetics 
is  beginning  to  make  an  indelible  imprint 
on  engineering  systems.  The  overview  pre- 
sented here  is  an  attempt  to  record  the 
most  important  things  that  are  happening 
and  to  identify  the  trends  in  these  events. 

2 . Early  History  of  Cybernetics 

Refs.  1-18  are  milestones  in  the 
early  development  of  modern  cybernetics 
theory.  An  essential  thread  woven  through 
all  of  these  references  is  that  determinis- 
tic rules  of  behavior,  while  they  sometimes 
apply  in  the  large,  do  not  appear  to  govern 
microscopic  levels  of  activity  in  problem 
solving  by  the  brain.  At  these  levels, 
the  laws  of  chance  generally  dominate. 
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w i t ti  vubtli'  busing  of  probabilities  being 
the  principal  mechanism  (or  guidance  of 
the  Drain's  responses. 

The  l)iain  is  endowed  with  immense 
variet\  o(  behavior,  this  richness  of  ac- 
tivity apparently  being  the  result  of 
probab i I i s t i ca I I y-cont  rolled  internal 
states.  Thus,  depending  on  bias  levels 
vrithin  its  neurons  and  neuron  aggregates, 
the  brain  can  realize  a highly  predictable 
input -output  response  at  one  moment,  and  at 
the  next  instant  its  behavior  can  be  en- 
tirely spontaneous,  showing  very  little 
obvious  relationship  to  environmental 
factors . 

On  the  average,  the  activities  of  the 
brain  are  guided  by  its  objective  of  goal 
attainment.  Without  getting  into  the  psy- 
chology of  goals  and  sub-goals,  goal  se- 
lection, and  the  interplay  between  con- 
flicting goals,  consider  how  the  brain 
organizes  itself  vis-a-vis  a given  goal. 
Partly  by  pre-conditioning  and  partly  by 
selective  biasing,  the  brain  establishes 
data  transformations,  makes  inferences 
about  its  environment,  decides  between  al- 
ternatives, and  initiates  actions,  largely 
under  the  dominance  of  the  goal  at  hand. 

This  goal  can  be,  in  many  cases,  simply 
stated,  and  the  degree  of  goal  fulfillment 
is  then  simply  gaged.  Other  times  the  goal 
is  exceedingly  complex.  But  a character- 
istic of  all  goals  is  that  their  mathemati- 
cal description  is  of  a much  lower  dimen- 
sionality than  the  actions  needed  to  ful- 
fill them.  In  simplest  extreme,  the  rate 
of  goal  attainment  becomes  a scalar 
measure  (e.g.,  rate  of  food  intake), 
while  the  directed  activity  still  has  very 
niany  variables. 

The  early  engineering  applications  of 
cybernetics  were  rather  limited,  but  the 
basic  principle  of  probab i 1 i st i ca I 1 y-con- 
trolled  states  directed  via  feedback 
measures  of  goal  attainment,  thereby  cre- 
ating rich,  one- i nto-many , dynamic  trans- 
formations within  artificial  brains,  be- 
came well  established  in  the  literature  of 
this  field  by  1 960 . 

3 . Recent  History  of  Cybernetics 

In  retrospect,  I960  was  a watershed 
year  in  the  development  of  cybernetics. 

At  that  time,  engineers  began  to  torn  to 
this  embryonic  science  for  help  in  the  so- 
lution of  several  challenging  problems  in 
the  design  of  physical  systems.  One  of 
the  first  problems  treated  was  that  of  ob- 
taining improved  adaptive  control  of  flight 
vehicles.  [Barron  in  28.  31;  Gw i nn  and 
Barron,  3'f  j Another  was  rapid  prediction 
of  re-entry  object  trajectories.  [Snyder, 
Barron,  et  al.,  25;  Gilstrap,  33J  A ong 
with  work  toward  ttiese  applications,  efforts 
were  intensified  to  place  the  theory  of 
probabi I I s t i ca I I y-cont  ro lied  Cybernet i c 


systems  on  a more  rigorous  foundation. 

The  work  of  Lee  21-2A,  and  Gilstrap  [21- 
2A,30,33[  was  particularly  notable  during 
the  early  1960's. 

Contributions  by  I vakhnenko  [29,38], 
Gilstrap  [36],  Mucciardi  [35,391,  and 
Barron  37]  in  the  late  '60's  and  early 
'70's  emphasized  establishment  of  a 
systematic  methodology  for  synthesis  of 
cybernetic  systems.  This  methodology 
will  now  be  outlined. 

4 . Modeling  via  Adaptive 
learning  Networks 

A cybernetic  system,  operates  with  a 
goal  that  is  to  be  fulfilled  at  a future 
time.  Gilstrap  ''36''  has  shown  that  the 
types  of  functions  that  must  be  implemen- 
ted in  such  a system  are  a predictor,  an 
objective  function  for  assessment  of  per- 
formance of  the  system,  and  a dec i s ion 
ru I e for  selection  between  alternative 
actions.  The  predictor  plays  a vital 
role;  it  anticipates  how  closely  the  goal 
will  be  satisfied  if  the  system  pursues  a 
given  plan  of  action.  The  predictor  thus 
allows  prompt  correction  of  system  actions 
following  a disturbance  or  change  in  the 
goal  . 

Although  many  forms  of  predictor  are 
candidates  for  cybernetic  systems,  those 
of  greatest  utility  are  the  adaptive 
predictors  which  learn  from  past  experi- 
ence. These  predictors  model  future- 
output  vs.  present - i nput  relationships 
for  the  goa I -di rected  system,  using 
recorded  data  from  past  observations. 
Because  the  adaptive  predictive  trans- 
formations can  be  created  directly  from 
observations  ard  are  not  necessarily  sub- 
ject to  a priori  assumptions,  these  pre- 
dictors are  potentially  as  accurate  as 
basic  measurements  permit. 

However,  a powerful  modeling  meth- 
odology is  required  if  the  full  potential 
of  adaptive  predictors  is  to  be  realized. 
Multiple  linear  regression  has  been  tried 
in  many  applications,  but  has  been  found 
deficient  in  the  following  particulars: 

(1)  the  model  must  be  linear  in  its 
unknown  coefficients  and,  there- 
fore, its  mathematical  structure 
must  be  assumed  a priori : 

(2)  the  number  of  data  points  must 
exceed  the  degrees  of  freedom 
of  the  mode  1 ; 

(3)  the  data  structures  used  cannot 
be  multimodal;  and 

(4)  the  least-square-error  fitting 
criterion  must  be  used  exclu- 
sively 

Techniques  for  nonlinear  network 
modeling  have  been  developed  that  elim- 
inate these  problems.  [25,29,30,33,35*38] 
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The  paper  by  A.  N.  Mucciardi  in  this  con- 
ference is  an  example  of  the  use  of  these 
new  techniques.  The  principal  steps  in 
the  modeling  methodology  are: 

(1)  acquisition  of  data: 

(2)  data  parameterization; 

(3)  partitioning  of  data  base; 

(A)  network  training,  consisting  of 
selection  of  most  informative 
parameters,  selection  of  connec- 
tivity and  approximate  coeffi- 
cient values,  optimization  of 
coefficients,  and  detection  and 
avoidance  of  data  overfitting; 

(5)  latency  reduction  of  network 

after  computation  of  its  output 
sensitivities  to  input  vari- 
at i ons . 

Let  us  consider  each  of  these  steps 
briefly. 

Data  acdu i s i t i on  emphasizes  record- 
i ng  of  all  variables  that  can  be  econom- 
ically sensed  and  that  reasonably  might 
be  relevant  to  the  modeling  task.  (Vari- 
ables not  needed  will  be  automatically 
eliminated  later  in  the  synthesis  proce- 
dure.) Standard  rules  for  the  design  of 
experiments  are  followed  in  the  choices 
of  operating  regions,  specific  conditions 
within  regions,  and  the  number  of  exper- 
iments at  each  condition.  Data  sampling 
rates  are  chosen  to  preserve  the  infor- 
mation content  of  signals  and  avoid 
a I i as i ng . 

Data  parame te r i zat i on  is  performed 
to  compress  the  dimensionality  of  the 
original  data  ani  thereby  reduce  the  total 
synthesis  work.  Cai?  must  be  taken  not 
to  discard  important  i .'format  i on . If  in 
doubt,  all  t*^e  original  variables  should 
be  used.  However,  when  the  original 
n«asurements  describe  waveforms  of  system 
variables,  the  derived  parameters  usually 
may  include  conventional  quantities  such 
as  measures  of  energies  in  specific  fre- 
quency bands,  Fourier  coefficients,  inte- 
gral functions  indicative  of  waveform 
shapes,  time  derivatives,  peak- to-pe-ak 
ratios,  mean  and  rms  values,  etc.  Whereas 
in  linear  modeling  the  extraction  of  suit- 
able data  features  is  a crucial  task  (be- 
cause only  the  input  features  of  linear 
models  introduce  interactions  and  non- 
linear functions  of  the  variables),  this 
problem  need  not  arise  in  modern  nonlinear 
network  modeling;  these  networks  create 
their  own  rich  combinations  of  inputs, 
generating--  in  essence  --  an  additional 
nonlinear  feature  set  within  each  of  their 
I ayers  . 

Partitioning  of  parameter i zed  data 
is  done  to  obtain  data  subsets  for  train- 
ing, testing  (overfit  avoidance),  and  in- 
dependent testing  (estimation  of  expected 
accuracy)  of  the  network  models.  A 


clustering  algorithm  '39!  is  used  to  in 
sure  that  each  of  these  subsets  contains 
a balanced  represent  at  ion  of  the  total 
data  base;  the  constituent  points  withii. 
each  mode  (cluster)  of  the  parameter i zed 
data  should  be  distributed  in  a constant 
ratio  among  the  subsets.  Clustering  anal- 
ysis also  reveals  if  the  character i st i cs 
of  the  measured  process  were  Stationary 
over  time  and  if  the  sensors  provided  con- 
s i stent  readi ngs . 

Network  t ra i n I no  techniques  > ave  never 
been  fully  disclosed  in  the  literature, 
but  Refs.  35-38  provide  an  outline  of  a 
procedure  that  has  enjoyed  a great  deal 
of  success.  The  basic  approach  is  to 
synthesize  the  network  with  building- 
block  elements,  first  using  a determinis- 
tic algorithm  [38  to  select  the  best 
input  parameters  and  to  establish  the 
connectivity  of  the  network,  t^en  em- 
ploying a guided  random-  search  algorithm 
to  obtain  coefficient  values  that  are 
optimum  in  the  global  sense.  Mathemat- 
ically, training  is  viewed  as  the  real- 
ization of  a suitabi  hypersurface  ap- 
proximation to  the  training  data  subset, 
with  the  testing  subset  used  to  terminate 
fitting  of  training  data  points  before 
overfitting  occurs. 

The  bu i I d i ng-b I ock  elements  most 
commonly  used  implement  the  bivariate 
f unct i on 

y = W(5  + W,  X,  + W2  Xp  + W,  X,  Xf  + 

w*  X?  + >4  . 

Treating  all  possible  pairs  of  input 
parameters  in  succession,  the  data  in  the 
training  subset  are  fitted  with  this  func- 
tion; i.e.,  the  best  w is  found  for  each 
pair  of  parameters.  Then,  with  reference 
to  the  testing  subset,  the  best  M pair- 
wise combinations  are  retained;  these 
provide  M new  parameters  for  transforma- 
tion In  a second  layer  of  the  evolving 
network.  This  second  layer  is  synthe- 
sized in  a manner  analogous  to  that  of 
the  first,  and  so  on  for  each  additional 
layer.  At  any  layer  the  growth  of  the 
network  may  be  terminated.  When  this  is 
done,  the  network  output  becomes  the  out- 
put of  the  best  element  within  the  last 
layer,  and  all  elements  not  required  to 
qenerate  the  inputs  for  this  output  ele- 
ment are  discarded. 

In  general,  the  outputs  of  elements 
in  a given  layer,  L,  are  multinomials  of 
degree  2L  involving  as  many  as  2^  input 
paran>eters  of  the  network.  Despite  this 
eometric  growth  in  network  modeling 
viz.,  t ransformat i on ) power  with  each 
added  layer,  the  computat iona I burden 
increases  only  linearly  with  L. 

Additional  layers  are  synthesized  as 
long  as  the  performance  (fitting  accuracy) 
of  the  network  improves  v i s-a-v?  s 
the  test  I no  subset  of  the  data.  But  when 
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this  piTfonrance  starts  to  degrade,  it  is 
clear  evidence  that  the  training  data  have 
been  overfirted,  i.e.,  that  the  network 
has  grown  too  large.  Growth  is  stopped 
short  of  the  layer  at  which  overfitting 
occurs.  With  this  procedure,  the  network 
may  be  trained  on  a very  small  amount  of 
data,  yet  possesses  ability  to  general  i?e 
accurately  on  its  iimited  experience. 

Once  the  deterministic  phase  of  the 
synthesis  is  completed,  expected  accuracy 
of  the  network  is  estimated  using  the 
independent  testing  data  subset.  Sometimes 
this  accuracy  is  not  sufficient  for  a 
given  application,  indicating  that  a syn- 
thesis phase  involving  giobal  optimization 
of  the  network  coefficients  (w's)  Is 
required.  Also,  when  the  network  is  put 
into  actual  service  as  a predictor,  it  may 
be  found  that  its  accuracy  gradually  dete- 
riorates, perhaps  because  of  changes  in 
properties  of  the  process  that  has  been 
modeled.  If  this  erosion  of  accuracy  is 
severe,  complete  re-training  might  be 
needed.  Usual iy,  however,  predictor  adap- 
tation involves  only  the  network  coeffi- 
cients, keeping  the  identities  of  input 
parameters  and  the  connectivity  fixed. 

Thus  the  same  type  of  algorithm  may  be 
used  in  adaptation  as  in  the  final  phase 
of  synthesis,  although  newly-acquired  data 
must  replace  at  least  some  of  the  data 
used  during  initial  training.  The  specific 
algorithm  that  has  been  found  to  be  nx3St 
suitable  in  both  cases  is  the  Guided  Accel- 
erated Random  Search  (see  Section  6 of 
this  paper).  When  using  a search  algo- 
rithm, just  as  in  the  deterministic  syn- 
thesis phase,  improvements  noted  with  re- 
spect to  a training  data  subset  are  tested 
with  a second  s t at i s t i ca I I y- s i m i 1 ar  subset 
before  their  acceptance  as  genuine. 

Latency  reduction  of  the  network  is 
the  last  act  m the  mode  1 i n^  'me  t hodo  1 ogy  . 

To  this  end,  the  surviving  input  parameters 
from  the  previous  steps  are  again  cluster- 
ed. Centroids,  called  the  prototypes,  of 
the  important  modes  in  the  data  are  deter- 
mined from  the  cluster  structure.  These 
prototypes  are  representative  of  the 
most-frequented  operatinq  points  of  the 
system.  The  network  inputs  are  set  equal 
to  values  given  by  each  of  the  prototypes 
in  turn,  and  for  each  prototype  the  net- 
work output  sensitivities  to  small  excur- 
sions performed  one-at -a- t i me  in  its  in- 
puts are  found  numerically.  Mathematically, 
this  is  the  determination  of 




j * I Ki 

where  V denotes  the  network  output  (the 
estimated  future  value  of  a system  vari- 
able), X.  is  an  input  parameter  of  the 

predictive  model,  and  jsj  i s an  input 


prototype  vector  from  the  final  clustering 
analysis.  If  the  network  has  been  trained 
properly,  these  sensitivities  are  usually 
precise  reflections  of  the  true  physical 
process.  Noting  which  sensitivities  are 
large  in  magnitude  and  which  are  small, 
it  is  often  possible  to  reptace  many  of 
the  X|'s  by  their  respective  constants 

from  the  x j ' s , changing  these  constant 

inputs  only  when  operating  conditions  of 
the  system  are  changed  from  one  mode  to 
another.  It  is  sometimes  possible  to 
eliminate  sensing  of  certain  of  the  inputs 
a I together . 

5 . Se I f-0 raaniz 1 no  Control 
Techniques 

A characteristic  of  cybernetic  systems 
is  that  they  usually  incorporate  a hier- 
archy of  decision  and  control  levels, 
each  having  its  own  appropriate  structure 
of  sensors,  effectors,  and  information 
processors.  The  goal  for  a lower  level 
in  the  hierarchy  may  often  be  to  implement 
effectively  the  commands  of  the  next  higher 
level.  Sometimes,  however,  a difficulty 
can  arise  at  a given  level  that  pre-empts 
the  attention  of  the  levels  above  it, 
forcing  them  to  re-organize  their  pro- 
cessing logic  so  as  to  meet  the  contin- 
gency . 

For  example,  a flight  vehicle  may 
nominally  have  outer  guidance  loops,  with- 
in them  certain  control  loops,  and  inter- 
nal to  all  of  these  a family  of  stabili- 
zation loops  (such  as  vehicle  angular 
position,  vehicle  angular  rate,  and 
actuator  displacement  and  rate  loops). 
Suppose  the  guidance  loop  has  a predic- 
tive model  that  assumes  a certain  nominal 
set  of  control-loop  dynamics  will  be 
maintained;  if  these  dynamics  undergo 
change,  the  predictive  model  could  be  re- 
trained. (The  previous  section  of  this 
paper  indicates  how  such  re-training 
might  be  accomplished.) 

Let  us  imagine  that  the  control  loops 
of  this  vehicle  normally  receive  a vector 
input  from  the  guidance  processor.  This 
input  consists  of  three  components  of  the 
desired  translational  acceleration.  The 
control  processor  normally  converts  this 
information  into  orientation  commands  for 
the  pitch,  yaw,  and  roll  stabi I i zat ion 
axes,  taking  into  account  the  actual  accel- 
erations to  compute  its  error  signals.  The 
controi  processor  may  use  a model  of  the 
transfer  properties  of  the  stabilization 
loops,  each  of  which  has  the  goal  of  keep- 
ing an  axis  of  the  vehicle  pointed  in  a 
commanded  direction. 

Now,  consider  the  problem  of  re- 
organizing the  control  processor  in  the 
event  of  a contingency  arising  in  the  in- 
ertial orientation  sensing  used  by  the 
stabilization  loops.  For  purposes  of  dis- 
cussion, suppose  that  a radiation  sensor 
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is  used  ds  a back-up  in  the  control  sub- 
system for  measuring  the  magnitude,  B.  of 
an  of f -bores ight  error  angle  (but  not  the 
direction  of  this  error).  Now  ask:  "If 

the  attitude  stabilization  loops  are  lost 
due  to  failure  of  the  inertial  reference 
unit,  could  the  control  processor  change 
its  logic  in  such  a way  as  to  solve  the 
problems  of  vehicle  s tab i I i zat ion  and 
control,  given  only  B(t)?" 

At  first  glance,  this  illustrative 
problem  may  seem  to  be  expecting  too  much 
of  the  control  processor,  because  not  only 
must  it  deal  with  a radically  different 
set  of  dynamics  for  its  controlled  object, 
or  plant,  after  the  failure,  but  it  must 
also  generate  a 3-vector  command  (pitch, 
yaw,  and  roll  actuators)  from  a single 
scalar  input  variable,  6(t).  Nevertheless, 
this  problem  has  a solution;  the  type  of 
control  processor  that  has  these  capabil- 
ities has  been  the  subject  of  considerable 
study  since  the  early  !960's. 

A reasonably  complete  picture  of 
the  development  of  se I f -organ i z i ng  control 
techniques,  suitable  for  the  above  problem 
and  for  many  other  applications  (usually 
for  primary  rather  than  back-up  control), 
is  provided  by  Refs.  29,  3k,  and  kO-80. 
However,  what  these  references  do  not  make 
entirely  clear  is  that  se 1 f -organ i z i ng 
controller's  (SOC's)  are  now  being  widely 
applied  in  communications  systems,  antenna 
control,  electronic  warfare,  s tee  I niaki  ng , 
Injury  recovery  control  systems,  and 
other  areas  in  addition  to  flight  vehicle 
control.  Also,  reading  the  references,  it 
is  easy  to  lose  sight  of  the  essential 
unity  between  the  fields  of  adaptive 
prediction,  se I f-organi z i ng  control,  and 
guided  random  search.  Specifically:  (1) 

a multivariate  SOC  usually  employs  a form 
of  guided  random  search  that  acquires  in- 
formation about  the  process  being  control- 
led, (2)  SOC's  generally  act  to  control 
predicted  future  states  of  process  re- 
sponse s . 

This  unity  is  borne  out  in  one  on- 
going industrial  application  '77]  tvhere  SOC 
logic  has  been  merged  with  predictive  net- 
works to  create  an  extremely  powerful  cy- 
bernetic system.  Both  the  SOC  logic  and 
the  adaptive  networks  in  this  system  use 
guided  random  searches,  but  the  frequency 
of  adaptations  within  the  SOC  logic  (which 
repeatedly  interrogates  the  networks)  is 
considerably  higher  than  in  the  networks. 
This  illustrates  a further  principle; 
in  hierarchical  cybernetic  systems,  adap- 
tation processes  usually  take  place  most 
rapidly  in  the  innermost  decision  and 
control  loops  and  progressively  less  rap- 
idly as  one  moves  outward.  Ultimately, 
the  level  of  an  inflexible  objective  func- 
tion (goal)  is  reached;  this  outermost 
function,  which  may  or  may  not  be  explicit, 
is  not  adaptive. 


Two  avenues  of  SOC  development  have 
been  I u I lowed.  In  both,  the  SOC  identifies 
the  values,  or  at  least  the  signs,  of  mem- 
bers of  the  open-loop  actuation  gain  matrix 
of  the  controlled  plant,  using  this  infor- 
mation to  realize  satisfactory  control. 

Both  approaches  employ  small  random  exci- 
tations so  that,  through  observation  of 
plant  responses,  the  requisite  matrix 
Information  can  be  acquired.  Rates  of 
chanqc  of  plant  responses  are  monitored 
.and  correlations  between  excitation  sig- 
nals and  these  responses  are  computed  to 
identify  polarities  and/or  magnitudes  of 
the  gain  matrix  elements. 

In  the  original  approach,  described 
in  Refs.  3k,47-k9,  53.  55,  57-62,  65-69, 
71-73,  and  75.  the  SOC  logic  is  partition- 
ed into  two  types  of  modular  devices.  The 
first,  referred  to  as  a performance  assess- 
ment unit,  computes  a binary  value  signal 
indicative  of  the  trend  in  error  perfor- 
mance of  the  system.  This  value  signal  is 
the  .ogical  product  of  the  sign  of  a com- 
ponent of  the  predicted  error  and  the  sign 
of  the  appropriate  component  of  the  rate 
of  change  of  plant  acceleration.  When 
acceleration  is  changing  in  such  a way  as 
to  reduce  the  predicted  error  of  the  sys- 
tem, the  value  signal  becomes  positive; 
otherwise,  it  is  negative.  The  other  mod- 
ular function,  referred  to  as  the  actua- 
tion-correlation logic  unit,  generates  a 
component  of  the  control  excitation  sig- 
nal by  forming  the  product  of  a function 
of  a component  of  the  predicted  error 
signal  and  a high-frequency  polarity  de- 
cision signal.  The  decision  signal,  up- 
dated at  a frequency  considerably  higher 
than  the  natural  frequency  of  the  plant, 
is  itself  the  output  of  a statistical  ex- 
periment generator  that  is  biased  by  a 
function  of  the  computed  short-term  cross- 
correlation between  the  received  value 
signal  and  the  polarity  signal  from  the 
prior  decision  time.  The  SOC  modules  are 
combined  in  a controller  for  which  the  in- 
puts are  the  desired  and  measured  responses 
of  the  system  and  from  which  the  outputs 
are  the  excitation  signals  to  plant  actua- 
tors. For  the  case  of  a plant  having  m 
actuators  and  n commanded  response  variable, 
m > n,  the  original  SOC  employs  n perfor- 
mance assessment  units  and  m x n actuation- 
correlation  logic  units,  each  of  the  latter 
corresponding  to  one  member  of  the  acceler- 
ation gain  matrix. 

The  original  SOC  logic  designs  iden- 
tify polarities  of  the  acceleration  gain 
matrix  elements.  A recent  derivation  by 
0.  Cleveland  and  L.  0.  Gilstrap,  Jr.  ’’ 7o , 

78  provides  a mathematical  foundation  for 
identification  of  both  the  magnitudes  and 
polarities  of  these  quantities,  and  this 
newest  technique  has  also  been  realized 
in  working  hardware  78’. 
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6 . Guided  Randow  Serif  C' 

Procedures 

The  preceding  discussion  and  the  lit- 
erature it  references  bear  out  the  thesis 
that  parameter  searches  are  required  by  a- 
daptive  cybernetic  systems  so  as  to  realise 
se I f - ad j us tmen t of  their  behavior.  It  is 
often  necessary  that  these  searches  solve 
a global  optimization  problem,  i.e.,  locate 
the  best  extremum  f nim  among  multiple  per- 
formance modes  (hills  or  valleys).  In 
nearly  all  cases  rapid  convergence  is  re- 
quired in  a noisy  signal  environment  and 
for  simultaneous  (or  nearly  simultaneous) 
adjustment  of  multiple  parameters. 

Several  types  of  search  logic  are 
treated  in  the  literature.  Tht  major  types 
are  systematic,  gradient  (steepest  descent) 
and  random  searches.  [25 , 30  , ij- 37  ,'<9 ,60  , 

90 , 96 . 98 , 1 00  , n 5 ■ 

Systematic  searches  employ  an  exhaus- 
tive su rvey  o(  ail  poss i b I e parameter 
values  and  are,  therefore,  capable  of  find- 
ing the  global  extremum  of  a multimodal 
performance  function.  The  drawback  to  this 
type  of  search  is  that  it  can  be  much  too 
•time  consuming  in  practical  applications. 
Also,  the  systematic  search  is  overly  noise 
sensitive  and  may  have  an  unacceptable 
search  loss  (lov/  average  performance  during 
the  search)  because  it  does  not  dwell  prin- 
cipally in  regions  of  good  performance,  un- 
less such  regions  encompass  most  of  the 
parameter  sgace. 

Gradient  searches  are  often  an  effec- 
tive way  for  seeking  a local  extremum,  par- 
ticularly for  spaces  of  low  dimensionality, 
but  are  incapable  of  solving  the  multimodal 
search  problem  ufiless  coupled  with  other 
methods.  Gradient  searches  niay  also  have 
other  drawbacks,  as  implied  in  the 
following  discussion. 

Randoni  searches  ? 82-86  ,88- 1 I 5"'  are  ot 
two  basic  types,  unguided  and  guided.  Un- 
uuided  random  searches  are  a form  of  ex- 
haustive search,  but  with  a random  strategy 
for  trial  generation.  Guided  random 
searches  achieve  enhanced  rates  of  conver- 
gence via  probabilistic  control  and  certain 
heuristic  principles,  as  will  be  discussed. 
Random  searches  of  both  types  are  inherent- 
ly suited  for  multimodal  search  and  optimi- 
zation problems;  however,  only  the  guided 
random  searches  are  of  practical  interest 
in  most  engineering  app 1 i cat i ons . 

A particularly  powerful  form  of 
guided  random  search,  known  as  the  Guided 
Accelerated  Random  Search  (GARS)  [ 3(l , 33-  37  , 
60 , 77 , ^1 , 1 00 , 1 1 4 , 11 5 j , conta  i ns  two  phases  , 
with  control  of  the  search  switched  back 
and  forth  between  these  phases  as  certain 
events  occur.  In  tlie  random  phase . an  in- 
formation gathering  phase”  val ues  of  param- 
eters in  the  search  are  chosen  at  random 
but  subject  to  a multivariate  probability 


distribution  fututit'ti  (pdl  ) (hat  govt  'iis 
the  reiat  ive  ai^'niitits  of  t iitie  spetit  i ti 
various  regions  of  the  parameter  spgte. 

I e t Iw’  de  t e rm  i ti  i , t i c p ha  se  . i n t f>  n:ia  t i on 
at  <)U  i r••iJ  f rr>n  the  random  expo  r i men  t a I 1 1 n 
is  exploited  in  an.<'t  lance  with  appro- 
priate heurislii  rules. 

Whereas  unguidc*d  random  earthes  e-- 
ploy  uniform  pdf's  to  govern  a select  it'ii 
of  all  trial  values  tf  the  paraiiete  rs  , the 
uniform  distribution  is  used  only  in  the 
opening  stage  of  GARS.  Afterward  the  pdf 
is  shaped  to  hasten  convergence  to  local 
solutions  and,  ultimately,  the  global  so- 
lution. Then,  as  the  search  nears  com- 
pletion, the  nuitier  of  parameters  being 
simultaneously  searched  is  sometimes 
lowered  from  the  total  number  of  param- 
eters in  the  search  to  some  fraction  there- 
of, selected  at  random  for  each  trial. 

The  random  phase  of  GARS  may  com- 
prise distinct  strategies  suitable  for 
ttie  opening,  middle,  and  final  stages  of 
the  search.  In  the  opening  stage,  a 
uniform  pdf  governs  sea  rch  t rials,  so  as 
to  conduct  the  exploration  with  equal 
probability  throughout  the  parameter  space. 

In  ttie  middle  s taue  . two  techniques 
exist  in  GARS  by  which  the  standard  de- 
viation of  random  steps  may  be  reduced  so 
as  to  guide,  i.e.  , quicken,  the  further 
trials.  The  first  of  these  techniques 
produces  an  explicit  i dent i f ica t i on  of 
the  modes  of  the  parameter  space.  for 
this  purpose,  results  of  the  opening 
search  stage  are  clustered  numerically  so 
as  to  reveal  regions  of  highest  and  lowest 
performance  and  to  characterize  these  re- 
gions in  terms  of  their  centroids  and 
variances  with  respect  to  each  of  the 
pa rame te rs . L f 16]  The  second  technique  se- 
lects some  best  fraction  of  the  trials 
conducted  in  the  opening  stage  and  initi- 
ates further  random  trials  that  are  bunch- 
ed about  the  points  in  that  fraction. 

These  further  trials  are  conducted  with 
a relatively  small  step-size  standard 
deviation  and  have,  as  their  purpose, 
identification  of  the  modal  structure  of 
the  performance  surface. 

The  final  stage  of  GARS  is  entered 
when  the  best-to-date  performance  reaches 
a level  near  to  the  asymptotic  performance 
expected  of  the  search.  In  this  stage, 
the  step  size,  may  be  controlled  partly 
by  the  measured  best-to-date  performance 
in  such  a way  that  the  magnitudes  of 
search  steps  shrink  to  very  small  values 
as  peak  performance  is  neared.  Addition- 
ally, an  "activity  factor"  is  sometimes 
used  in  the  final  stage  as  a means  for  re- 
ducing the  dimensionality  of  the  random 
perturbations,  ultimately  converting  the 
final-stage  search  from  a simultaneous 
exploration  involving  all  parameters  to 
a more  nearly  sequential  search  involving 
a small  fraction  of  the  parameters  in  any 
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one  btep.  Tfiis  fraction  is  randoir. ly  se- 
lected f.  each  new  trial. 

In  most  app I i ra t i ons , noise  in  measure- 
Mients  of  system  perforn\ance  cat)  produce 
deceptively  qood  results  for  any  one  e>- 
peri'nent.  lo  avoid  the  risk  of  spurious 
Hieasu rements  locking  the  search  on  a false 
solution,  the  random  phase  of  GARS  period- 
ically re-examines  die  performance  it 
achieves  at  the  supposed  best-tr-date 
setting  of  parameters.  Each  new  measure- 
ment of  performance  for  this  setting  is 
used  to  refine  the  estimate  of  best- to- 
date  performance.  Furthermore,  to  Compen- 
sate for  non- s tat i ona r i ty  of  the  per- 
formance surface,  contro’  of  the  search 
is  returned  periodically  to  the  logic 
for  the  previous  stage. 

A useful  heuristic  principal  used  in 
the  random  phase  of  GARS  is  called  "re- 
vp  • ' " Whenever  a random  experiment 

’ produce  a performance  improvement 
ted  relative  to  t he  prior  best-to- 
formance ) , a step  of  equal  magni- 
ppos  i te  direction  is  taken 
■ mg  the  best-to-date  parameter 
. to  define  the  point  of  departure), 
step  in  the  opposite  direction  is 
used  in  the  expectation  that  a performance 
improvement  will  often  be  found  by  moving 
exactly  opposite  to  an  unsuccessful 
di  rect  ion. 

The  deterministic  phase  in  GARS, used 
in  alternation  with  the  random  phase,  is 
entered  whenever  a new  best-to-date  per- 
formance value  is  obtained.  In  the  de- 
terministic phase,  additional  basic 
heuristics  that  speed  convergence  are  em- 
ployed. These  heuristics  act  to  exploit 
the  information  gained  from  a successful 
random  experiment.  The  heuristic  prin- 
ciples used  include; 

(1)  Repet i t i on  --  A successful  step 
is  followed  by  another  in  the  same  direc- 
tion. 

(2)  Acce le rat i on  --  The  magnitudes  of 
further  steps  are  1 engt hened  in  an  arith- 
metic or  geometric  progression.  This  pro- 
duces rapid  progress  even  though  the 
initial  successful  step  may  have  been 
quite  small.  Then,  whenever  an  acceler- 
ated step  produces  an  unsuccessful  out- 
come, the  search  "backs  up"  to  establish 
t‘e  approximate  location  of  the  maximum 
or  ridge  that  has  been  traversed.  The 
deterministic  phase  is  exited  once  this 
deceleration  is  completed. 

Rapid  convergence  is  important  for 
search  techniques  used  in  cybernetic  sys- 
tems. Rastrigin  ^89]  has  shown  that  the 
mean  rate  of  convergence  of  a random- 
direction  search  will  exceed  that  of  a 
fixed-step  size  steepest  descent  procedure 
when  the  criterion  function  is  unimodal 
and  the  number  of  parameters  exceeds  four. 
An  adapt i ve- step- s i ze  random  search  nas 
been  shown  by  Schumer  and  Stei  glitz  ''101'' 


to  he  even  more  effi;  lent.  T'ley  how 
that  for  gradient  method-,  the  number  of 
criterion  evaluation,  increases  i ti  prc 
porticin  tt'  the  square  r,r  the  number  ef 
parainelers  (N)  and  the  - o-'putat  ion  time 
increases  as  N'  , while  for  f he  adaptive- 
step-size  random  se^  r,  h the  number  e( 
evalualictns  increases  in  proportion  to 
the  first  power  of  N and  the  comfutation 
time  as  N'  . Gilstrap  cited  in  60,100 
states  that  the  expected  number  of  cri- 
terion evaluations  for  GARS,  divided  by 
the  number  for  a f 1 xed- s tep- s i ze  steepest 
descent  methoif,  is  approx  i/iate  I y 

^ log.  N 
(or  N 2 3. 

In  summary,  the  advantages  of 
guided  random  searches  such  as  GARS  are; 

(1)  the  searcfi  is  multimodal,  i.e., 
convergence  is  independent  c-f 

i n i t ia  I cond It  ions ; 

(2)  very  rapid  convergence  can  be 
achieved,  even  for  a large 
number  of  oarameters,  making 
real-time  adaptation  feasible 
for  major  systems; 

(3)  there  are  no  difficulties  wit* 
step-size  control,  gain  factors, 
and  the  like; 

(k)  the  search  is  relatively  in- 
sensitive to  noise  corruption 
of  the  performance  criterioi, 
measurements;  and 

(5)  any  computable  performance 
criterion  n.ay  be  used. 

7 . Applications  and  Hardware 
T rends 

The  engineering  applications  of  cy- 
bernetic systems  are,  in  a very  literal 
sense,  nearly  as  old  as  technology.  (For 
example,  Hammurabi,  king  of  Babylon  in  the 
20th  century  B.C.,  devised  a feedback 
mechanism  that  regulated  the  water  level 
in  an  irrigation  system.)  But  until  about 
ten  years  ago.  there  was  little  conscious 
employment  of  nxjdern  cybernetics  theory 
to  aid  in  realization  of  goal -di reeled 
systems  and  exploitation  of  these  systems 
in  difficult  modeling,  control,  and  op- 
timization tasks.  Today,  however,  the 
catalog  of  applications  receiving  active 
attention  (within  the  modern  context)  is 
large  and  is  growing  rapidly. 

The  accompanying  table  and  the  cited 
references  list  only  those  projects  of 
which  the  author  has  knowledge;  no  claim 
can  be  made  for  the  completeness  of  this 
enumeration.  It  is  seen  that  modern  cy- 
bernetic systems  are  serious  candidates  in 
numerous  areas  of  application.  And  hard- 
ware for  these  new  systems  has  been  or  is 
being  developed  in  many  areas. 
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SUMMARY  OF  APPLICATIONS  OF  CYBERNETIC  SYSTEMS 
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1 83 

Analysis  of  Head  Injury  Data 

X 

X 

Head  In'vir.  Patient  Recovery  Control 

X 

X 

6 

♦The  symbol  "x"  denotes  no  publication  in  open  I i te  ra  ture  . **^-B^eadbot<rd^y«=P  ro  tot  ype , 
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Aircraft  fly-by-wire  SOC  logic  was 
successfully  flight  tested  by  AFfOL  and 
AFAL  in  1969. 

AFAl  brassboard  flight  tests  of 
adaptive  ECM  equipment  are  taking  place 
at  the  present  time,  and  AFAL  is  also 
currently  sponsoring  development  of  brass- 
board  adaptive  ESM  equipment. 

AMRL  has  successfully  demonstrated 
i readboard  SOC  logic  developed  for  the  RPV 
man-machine  interface. 

A.  E.  Zeger  presents  a paper  at  this 
conference  on  the  AMTI  phased  array  adaptive 
antenna  i mmob i I i zat ion  system,  now  enter- 
ing a breadboard  development  stage  under 
AFAL  sponsorship.  This  system  incorporates 
a form  of  GARS  search  in  SOC  logic  to  min- 
imize Doppler  clutter  spread  of  the  radar 
s i gna I s . 

One  particularly  important  hardware 
trend  is  toward  custom  large-scale  i nte- 
ration  of  digital  circuits  for  adaptive 
earning  networks.  0.  Hampel  reports  on 
this  AFAL  project  in  a paper  at  the  pres- 
ent conference.  Hardware  development  of 
adaptive  learning  networks  has  also  been 
supported  by  Armco  Steel  Corporation. 

Armco  is  using  predictive  control  via  an 
adaptive  learning  network  for  the  finishing 
of  hot  strip  steel  and  is  developing  an 
application  for  control  of  an  important 
melting  process . 

8 . Concluding  Remark 

An  overview  of  the  theory  and  ap- 
plication of  cybernetic  systems  has  been 
presented.  Because  of  security  restrictions 
and  the  proprietary  nature  of  many  aspects 
of  this  new  field,  and  also  because  of  space 
limitations,  it  has  been  necessary  to  omit 
many  significant  details.  It  is  hoped, 
nevertheless,  that  this  survey  has  convey- 
ed a feeling  for  the  promise  and  vitality 
of  work  that  is  taking  place  in  cybernetic 
systems . 
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APPENDIX  B 


\etworki  that  It'arn  arc  uxi’fitl  in  rom[)iiter  aiilrd  design  and 
mannfactiirinjt  to  irnproi  r prociws  prrfoi nianrr,  redurn  co.st.'i, 
and  dermasr  thn  need  for  onpinm  inp,  analyses.  Here's  how 
they  learn  and  hon' they  are  applied 


Learning  Networks  Improve 
Computer-Aided  Prediction  and  Control* 


Roger  L Barron 

Adaptronics,  Incorporated 
McLean,  Virginia 


A quirt  rrvoluti(»n  is  sprradinK  in  the  way  that  engi- 
nrers  think  about  manufarturing  processes.  In  the  past, 
these  processes  were  designed  and  operated  on  the 
basis  of  either  do<lrinaI  or  engineering  concepts.  Ana- 
lysts and  engineers  have  attempted  to  write  instructions 
for  control  computers  in  the  same  way  that  they  in- 
struct human  o|)erators  in  how  to  |>erform  approp*^iate 
manual  ctmtH*!  functions,  lluis  applications  of  computer- 
aide<l  manufacturing  during  the  first  two  decades  of 
the  compiiler  era  have  consciously  or  unconscioiislv 
taken  the  form  of  • one  for-one  replacements  of  human 
oj>eratrtr  fiimtions  by  equivalent  computer  functions. 
This  concept  of  replacement  has  maintained  or  even 
reinforred  traditional  reliance  on  analytically  de- 
ri\e<I  models  of  processes. 

.Now.  however,  sjieed,  memory,  and  de^iendahility  of 
computers  are  lea<iing  to  fresh  approaches  to  model- 
ing and  control  of  manufacturing  processes,  and  a 
mjml)er  of  exciting  new  methods  are  receiving  atten- 
tion. one  of  which  is  the  ^‘learning  nelwcirk”  approach. 
Networks  for  computer  aide<l  preilictiori  anrl  control, 
which  may  f>e  implemented  in  computer  software  or 
|>eripheral  hardware,  can  learn  to  predict  trends  in  a 
prr»cess  from  the  natural  data  it  produces.  The  process 
is  characterized  entirely  from  its  observable  variables 
rather  than  by  a set  of  thefiretical  equations;  its  true 
characteristics  are  embodied  in  the  natural  stream  of 
data  that  flows  fr<im  it.  ITie  prj>blem  is  thus  one  of 
generating  a process  mo<fel  from  data  rather  than  one 
of  estimating  what  the  data  will  )>e. 

At  the  very  least,  tins  way  of  tldnking  can  aug 
ment  traditional  approaches.  '1‘he  learning  network 


i<lentifies  which  variables  plav  sigmlic.mt  rob-s  m th»* 
behavior  of  a j>rocess  and  ‘•hnws  h«>w  thesr  v.ifi.ibles 
interact  with  eacfi  olhei  (usu.illv  nonlin**a»U  » in  «le 
terrnining  just  wlial  the  proce«.s  will  do  lhu>«  the  netw-.ik 
points  the  way  toward  improved  ihcniv  foi  t>  uiition.d 
miMleling  woik.  .Mternativ«*lv.  the  rieiwutk  < an  iin»drl  ihr 
most  uncertain  aspects  of  a pr<n’e«.«.,  leaving  •ah*' 
alreaily  fairly  well  under'*l«MHl  a*»|x*<t-»  l<>  {he..i«tt«  il 

derivations. 

I^uirnirig  networks  them*‘<'lves  r infer  and  pfeili>  f 
jiroeess  behavior  very  acr  ur.ilelv . I idike  no'^t  | iedi« 
live  models,  ern»ls  ma<!e  bv  tbe>»e  netwoik-*  •!■.  n .1 
neressarily  increase  rurnulalively  with  increasmv'K  |fiiir*  i 
forecast  intervals.  Also,  they  can  adaj»t  to  ch.mvuig  pin 
cess  cbarncterislics,  keeping  themselves  up  l<*  d.ifc  w iih..ui 
requiring  tuning  or  re<lesign  by  burnan  sjvcia)i*.i- 

iVrhaps  mos4  exciting,  learning  iitMwoiks  .n»-  ,iM* 
to  predict  from  data  that  are  produred  *'nalui.dl\" 
by  processes  -including  reconls  of  sounds  or  vibi 
subjective  evaluations  of  product  qualitv.  '>r  vsliii 
ever  variables  are  readily  accessible  for  c«  ••noinir.il 
measurement.  Il  is  only  necessary  that  the  n.iiur.il 
variables,  taken  in  concert,  contain  inforniatn>ii  fi.i 
the  inferences  or  predictions  to  Ire  made. 

In  other  words,  computer  aideil  muniifai  tui  ing  it  \M  I 
systems  need  not  rely  on  instrumentation  of  stale  vat  I 
aides  and  other  conveniences  of  ohler  theories  ab«'Ul 
control.  Often  there  is  no  way  to  instrument  sneh 

•This  artirle  in  based  on  a inper  prrsenicd  bv  Mr  n.un*n  at  ibr 
1975  (!An/('AM  (lonfermce  of  ihe  Society  of  Manufai  lurint  I n 
aineers.  lo’ld  in  ('hira«n.  II). 
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Polynomial  and  Multinomial 
Approximations 

Snpptf^r  iImI  llif  input  ••tn^i'l-  of  N o1»ti \ \j. 

\\.  ami  liir  oiitpul.  v.  a al.ir  •pninlitv 

Mln»**r  j«*  l)ir  n-liin.ilf  of  .i  jMilif  nl*ir  piopril> 

of  iIh*  input  pioft*'**..  ill  ^rricial,  \ wi|]  )>r  a rionlimar 
fumtion  of  tlip  \*H.  i ml(*i  faiiU  ^onciaf  romlitioii'*. 
iIji**  fum!(</n  of  \ \aM.i)>lo*.  ran  In*  rxpr<*«w«l  in  an 
\aliinrn>it»nal  Mariam  in 


Learning  Networks 


Infrrrmt*  ami  pir.lirtion  pioi»lrm’>  Jiivolvr  opt'ialioii'* 
\Mth  M-n'or  u i oltlaim't}  a*  a tr'-ull  of  injr  a 

pin -it  al  pr  «■'-  ] in*  « la'**!' a)  approaffi  Jo  tin* 

• ompiitr'  inotlrl  iia"  l»ron  t«>  u-*r  all  rrlr\ant  <ir- 
Iftmini-^tir  t«r  stati'*tiral  riiaractri  i'*lir*i  <»f  liir  prorrs« 
i • m«  olo*f*rvr*l.  .thtiitf  (rfJain  a-'»«imipf ioti'*  in  <lr- 

?*iKn  ' aitMilalittfi'*.  Vrr%  ofl»*n  tin*  •lo'*ij^nrr  pn'^mm*"  llu* 
'‘timturr  of  lln*  mmlrl  ami  m«*u*lv  ralrulatf^  \alvirv 
*if  rrrfain  l'ararm-{»*r-.  f.\rn  if  lin-  natur**  <»f  llir  oh* 
MONril  prori'H*.  < h.mt'rs.  ihr  -Irurlurr  of  thr  imulrl  oftrn 
chtf'*.  not  riianKt*:  hut  tin*  t!r'*ij,-m*r  atlju-t-^  tin*  paranirte'r 
\ahi«’'‘  in  re'-pon-r  (o  imM''Uir<f  rfianurs  in  in|tu(««  or 
oulputo 

in  niaiiv  inipttftant  appli«  .itiom^.  ohM*r\alilr  inpuU 
arr  tiiMiruit  to  ({^'•tiitK'  aiiai\ I it  all> . Uir  hr*>t  or  (*Nrfi 
a >:o«*<j  «*liurtui»*  (or  ihr  nuMlrl  rannot  l»r  «!»‘trrnuiu*il 
in  ailvam**.  In  tfii**  ra**r.  a ({(■'•iiahlt*  imuh-l  “Irurlurr 
1 an  aifjU't  It*  n pri'M-ntaln**  inpiil**.  I hat  i*»,  tfir  inoifrl 
it*  trainnhlf  in  hotli  it**  **lnirtiii»‘  ami  paranwt»*r  \alurs. 

I rainaltililv  in  "Init  lutr  miplif*^  tlir  i‘xi*‘trnrr  of  in 
tri«  oimn  tiofis  of  >imilat  rfrmrnlarv  Itiiihiirm  {lioi  kH 
in  a nt'tvvork.  liii-  nrtvsotk  i*>  iiH‘<i  hv  a grnrral 

(tiviialK  nonlintMrl  fumtion  of  rr'rlain  input  \arialilrs 


\ltlmugli  in  thr  irm^l  ^riirial  raM*.  thr  rorfliii«nl»  air 
fum  tions  of  liinr.  umirrlyin^  rliaiat  trri«*lir'.  of  tlu  \’s 
often  do  Mol  ifrprmf  on  timr.  mi  that  thr  otrffi*  irnt-^ 
arc  ron^tant*^. 

1(1  apply  thi*^  Matlaiiiin  srrir'**,  id«-nlitir**  of  thr  ol>- 
MuvahhM  oi  nira>uirtnrnt'*.  x,.  mii>l  fir  kno\>n.  almij: 
with  lfu‘  nurnltrr  of  trims  in  ifir  Muirs  nrrih'd  to 
piovidr  <in  arrrplahir  approvimalioti  to  thr  drsitrd 
fumtion  r\rn  iIioukIi  this  fumtion  in  ii*»«*lf  riot  known, 
lo  ilrtrrrninr  thr  ohserx alilrs,  all  tfuisr  tlial  arr  th^>u^:hl 
t<i  ha\r  a lirariim  on  thr  drsirrd  output  arr  ummI  at 
fiisl,  ami  tfir  orirs  wIiom*  trial  sfuiws  ihrni  to  l»r  of 
liltlr  usr  are  latrr  <liM*ardrd.  Ilir  miinftrr  of  Irrins 
in  drtrtminrii  adaptivriy  uilli  a trainahir  nonlim'ar 


Fig  1 Iparnmg  network  element  A digi- 
tal mechani/alion  of  the  functions  shown 
here  is  used  to  implement  the  first  six 
terms  of  a Mactaurin  senes  that  expresses 
a property  or  characteris'ic  of  a process 
in  terms  of  its  inputs.  It  ias  only  two  in- 
puts, but  networks  of  elements  such  as 
this  can  combine  all  the  inputs  in  pairs. 
Usually  fewer  than  all  possible  pairs  are 
necessary 
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tM'IsvMik  of  iiilrtf  oi>nr<  ti-il  -Irmritl^.  r.i<  fi  of  ulioii  im 
l-lriiifiiU  d '“mu'ir  >•«•«  oihl  fiiiHtioh  of  m 

aiitl  {•IK'  outvul.  aU'*  iiM  In'*!  .tiol 

< oit-^laiit  liM  in'-  « I IK  f * - 

S M • H \ • V*  ♦ V>.\  H \/ 

( fu'‘  fiin<  (ioii  I**  .ilcfif  lo  flu*  ••ix  «>f 

tfu*  Mail.turin  f«»t  Iv\t»  s.in.iMt*' 

( ompolirnt"  of  tlu'  t-lriiH'iit  |IM\  1m'  ir.ili/rit  ii^ii.K  «iiK>tal 
«lr\  jri*s,  \ nrluoik  i*f  < .in  !»»•  li.un«  «!. 

liv  .I'ijuNlinK  xaliif*''  of  r<»rtfu  n'ltt'-,  lo  lipi'foxim.itr  iJn* 
N tiinirn’^ion.il  **ri  ir^.. 

Networks  of  the 
Basic  Element 


in  r.M  li  •IrnirnI  of  iIh*  (iihI  Ih>»T,  <2»  M-lnljfiK  tf»o**«* 
i lfinml^  \vlo‘-r  oulpiil  I**  a«  M*|»lal*l«‘  v.fiil<*  r»*j»Mlimr 
ilif  I'tH.i  i>fi  foi  iiifi l3»  io|M*atlnK  steps  (I)  arni  I2I 
fo-  cfn*  irinainiriK  I.imm*..  ainl  < l i opIiini/inK  all 

« Of  III  irnl'  in  .ill  l.i\<*rs  haHed  upon  nt'lv^oik  output 
I hr  knoun  data  hasr  flividrd  into  ihtrr  indr 
prndrni  hut  s|.j|j‘*ln  al)>  sjniilai  suh«.rls:  a filliiiK  subset, 
to  drtriinme  rortiM  irnts  fd  the  eleiiirnts;  a M*iection 
suhsri.  to  ir|r«l  tlie  pool  i»ri  foi inri s ; and  an  exalualiofi 
Huh'.rt.  lo  rx.ilu.ilr  ti\riall  peiforinaiue.  I illiiiK  an<f 
srlri  lion  '•uli'.rl  air  also  Used  for  K^'b**!  optimiza- 
tion ’^ifnr  thr  »'\aluation  sub^rt  is  rmt  n M-d  for  net 
v>oik  s\  iiihr-is,  prifotinamr  on  it  aMUialrly  r-lirnatrs 
llir  iirtwoik'*  abililv  to  al i/.r  to  nrv\.  prrxiou'ly 

lin-rrn  d.ifa. 


In  a nrtv^o|k  <d  l^fi  lavri«.  of  lhi-»e  simple  rlrrnmls 
I i Ik  2>.  rarh  input,  /.  lo  the  simiin.ilion  rontains 
pail\%|se  prodmis  <d  ihr  iirlsxuk  input'*.  x„  up  to 
dr'Kirr  f.  \shilr  ihr  lii«*l  layer  ifuilains  all  pos^ddr 
pails  .d  thirr  inpul*  lo  iinplrnirnt  a K*’iu’ial  multi- 
noiiiial  expression  iinrirK  a poUnoinial  in  many 
xatiabb'si,  the  number  <d  elements  In  ea<  h lasei  vsould 
have  to  Kiov>  as  one  pro<eeds  deepi‘1  into  the  network. 
However.  It  is  found  enipiiu  ally  lliat  aiftplable  a(i- 
t>roximalioris  are  obtained  v%ilhout  tliis  Kiowlh;  iii  fart, 
the  number  <d  elements  m suia fsMve  layer-  d(>i  (eases 
usually  after  t>nly  two  or  tliiee  layeis  until  only  a 
few  are  left  as  inputs  t,)  ih'-  adder. 

Known  Data  Set 

hetfuminiiiK  tin*  co<dli(  ients  of  rath  lU'lwoik  e|em<*nl 
and  the  nuiidiei  and  inteieonni't  tions  <d  those  elements 
reipji'es  a known  data  base  tluil  is.  a data  base  for 
whifti  the  values  <d  llif*  depemh'iil  variable  ate 
knovMi.  Steps  involved  are  I I ) optimi/iriK  tlu‘  roeflieients 


Fig  2 Two-layer  networs  Three  inputs  combine  into 
three  pairs  at  the  fust  level  of  elements  such  as  that 
of  Fig,  1.  At  the  second  Ic  'I.  outputs  of  the  first 
level  recombine  in  pairs  to  generate  up  to  fourth- 
degree  signals  Theoretically,  later  stages  would 
snowball  in  number  and  complexity  of  interconnections, 
but  in  practice  they  do  not 


Training  the  Network 

l lfimml  rorflirir'iil  drtfrmin.ilions  arr  basrd.  in  i».nt. 
upon  a Ifasl  s,|u,ur.»  lit  to  a desired  output,  wberel»y 
tin*  eleiiieiil-  .lie  tiis|  adjusted  by  a nuHiix  alpebrair- 
piortflure  and  thru  by  a rnursive  srarrli  m oplinii/.l 
tion  prorfdure.  (Other  niteiia  are  rd  roiiise  p<»ssihle,  and 
an*  (dleli  Used,  I 

h ittiiiK  and  sriei  lifiii  subsets  arr*  User!  alt(‘rn.it<*lv  in 
tiainiiiK  fa' h layer,  l ir-l,  N sper  ilu  idisrrv aides  that 
are  the  inputs  to  e.u  h r*l<*tnent  arr*  r Imscn,  iiiofr*  or 
lr*ss  arbitrarily,  and  arranK«*d  into  NiN—  It  2 pairs, 
feeiliiiK  a like  nurnbr-r  rif  liainaidt*  elements,  sm  b as 
fb.'it  sbr/un  ///  /K  b Iben  the  lining  subset  of  tlie 
known  data  bast*  is  applied  to  establish  the  i «)el!i<  ienls, 
usiiiK  a ri*Mitsive  sr.in  b proeedui**  witli  a lr*ast  sijuarr*s 
fiitr*ii«m.  riie  piori'duie  is  rr*pr*atr*d  for  r*a<  li  «d  ibr* 
\ I N 1 I 2 elr*ments, 

Nr»l  ail  paiiwisr  ( oiidunalions  are  siKidlirant  in  ex 
tiailiiiK  thr-  rli*siierl  information.  Ibr*  selertion  process, 
UsiiiK  ifir*  se|r*«tioii  sul>sr*l,  (*liininal<'s  those  elemr*nts 
whose  pr*t  for  mam  e is  not  acceptable,  a**  mr*asiiied  by 
the  s(|uuie  of  tlie  eiioi  m.iKtiitutlr*.  Tliere  are  now. 
.sav.  b f*leinenls  that  suivivr*. 

I br*  process  is  ietM*atr*d  foi  the  sr*<  olid  layei.  wliicb 
Initi.dlv  cont.iiiis  K<K  I t 2 t*lf*inents.  invidviiiK  ait  pans 
of  the  suivivinK  r*lr*nM*iils  m ibr*  liis|  layei  which  now 
is  again  ft*«i  by  ibi*  httiiiK  std>si*t.  ( .oeibr  it  nts  «d  e. . b 
element  in  the  seiond  lavrr  ate  drlr*i minr*rl  as  in  tlie 
fust.  Iben  ibr*  sr*lr*<l irui  Mrbs<-(  is  fed  a vrruid  timr* 
into  tiu*  lust  layi*i  and  the  iinar  r epiable  pails  ebnii- 
naled  from  thr*  sr*«  otid  layei. 

Till*  piorr*ss  \si  ie|H*aled  with  sucrr*r*rliiiK  layers  until 
the  eiioi  rair*  on  thr*  selection  subset  reaches  a suitable 
low  levr-l.  MlbouKlt  fuilliei  ir'diictioiis  in  eiior  late 
im  tlir*  filtiiig  suliset  eould  hr*  made  1)>  incorpoialing 
.iddilioiial  layers,  to  do  so  would  pioducr*  ovetlitliiiK 
of  the  litliiiK  d.ila.  Kventually,  a fdrtgle  output  results 
from  earb  rd  sr'vt'ial  disjoint  su)>nr*tworks;  these  out- 
puts aie  adrird  lo  pirMluce  a single  output  from  the 
entile  network. 

liy potbr*tical  example  of  the  result  of  the  train- 
iiiK  pioii'ss  to  tills  point  (Fig.  implies  that  at  least 
do  (amlidatr*  paianieters  were  initially  in^verterl  into 
the  lust  layer,  of  which  only  a few  survived.  The 
figure  sfiows  that  pair  tX|,  x-.>sl  interacts  with  pair 
luit  pails  (xr„  Xh)  and  fxm,  x.>o)  do  not  inter- 
art  witf)  earli  other  or  with  the  otbr*r  pairs.  1'bus  the  out- 
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Fig  3 Illustrative  learning  network.  Although  30 
or  more  inputs  were  hypothesized  as  contributing 
to  the  process  cor^trolled  by  this  network,  only 
eight  survived  the  training  process  as  contributing 
significantly  to  the  output  Furthermore,  only  four 
of  the  eight  required  a subnetwork  more  than  one 
level  deep.  Typically,  learning  networks  have  many 
more  elements  than  shown  here 


pul''  of  lliK't*  <li''ioiiit  '‘Uiiiit’lwi'i art*  adiirtl  to  pro- 
a output. 

\ linal  >ti'p  in  llu*  training!  pttM  ess  is  a \err\ier 
a<li»-lrm*nt.  or  bm*  tuniufr.  oi  llir  roenjiieiit.''.  llns  may 
ari'»f  lietaUM*  llir  idertirienls  of  ea«h  flenvnt  have  l>een 
ailju'ttMl  in  th«'  alivrue  of  interartions  witli  other  ele- 
menl'-  follovviiiK  tlii'tn  in  the  network;  opliniiim  ro- 
eHn  ient  \aliit"'  may  he  ilitTeient  when  these  interac- 
tion'' are  pre"enl.  Kitting  anil  seleiTion  sulisets  are  aUo 
used  for  ihi"  final  adju"ttnent  process.  The  vernier  ad- 
iuHtmi'til  a ^:Ud»al  seaieh  may  u*'e  a randtmi  tech- 
niiiue  to  tditain  final  value'-  of  the  < oeflicients.  as  well 
a-*  for  '-iih'-tMiuent  netv\ork  adaptation.  *\fter  final  ad- 
ju''lnient  of  coefticienls.  the  e\aluation  suh''et  is  u-'crl 
to  e-'tiniale  perhurnance  of  the  t'titire  network. 

\\(ddaner  of  o\erfiltinr>  i"  a kev  a*'pe<  t in  the  training 
of  learning  network-.  (h>od  fumlioriai  approvirnations 
to  the  fitting  data  suh-els  must  he  obtained  that  also 
closelv  a|>pro\irnale  the  <lala  in  the  separate  seleelion 
suh'*ets  that  i-i.  the  networks  can  he  taught  to  gen- 
erali/.e  jiroperly  on  their  exj>erience  in  fitting  the  points 
in  the  first  su)i''ets,  .so  that  error  rates  in  later  u.ses 
will  he  low.  If  overfiUing  is  not  avoided,  the  network 
produie-  ileceptively  sma\i  errors  in  afiproxirnating  its 
hr-^l  sets  of  data  and  then,  in  most  eases,  does  poorly 
on  suhsequenl  new  data.  Often  heard  of  are  empirical 
models  that  apjiear  to  have  much  promise  initially  hut 
that  prodiK’e  unacceptalde  errors  when  presenter!  with 
new  data:  in  most  cases,  such  behavior  is  the  result 
of  overfitling.  By  using  three  in<iepemfenl  subsets  of 
the  available  data— taking  care  that  each  is  statistical- 
ly representative  of  the  whide  data  base,  the  problem 
of  overfitting  is  virtually  eliminated  and  good  a«l- 
vance  estimates  of  o(K*rational  error  levels  of  the 
model-  can  he  (d)tained. 

If  a model  realizeri  by  a learning  network  can  be 
guaranteed  not  to  he  overfitterl,  it  will  he  a smoothly 
fitted,  furntional  approximation.  Mathematically,  this 
approximation  is  a continuous  ami  differentiable  func- 
tion. derivatives  of  which  closely  aff{rroximate  the  c|uan- 
tilative  rleriv alive  behavior  of  the  real  processes  that 
are  morleled.  For  this  reason,  numerical  partial  de- 
rivatives may  lie  computed  which  reveal  the  quantita- 
tive s«*nsitivity  <d  tlie  modeled  variable  (yl  and  thus  of 
the  process  that  has  fK*eri  mo<le|e<l  to  small  variation.s 
in  s|H*cifie<l  values  <d  the  network  input  variables. 

As  will  l>e  seen,  ability  to  interrogate  learning  net- 
works at  arbitrary  point"  f within  their  regions  of  hi 
to  prior  data  I,  tfms  finding  predicted  values  and  sensi- 
tivities of  process  responses,  is  the  key  to  u.se  of  these 
networks  in  computer  aided  design  and  manufacturing 
(C:\l)  CAMl. 

Use  of  Learning  Networks 

\ learning  network  in  TAM  implements  a predictive 
model  for  the  profess  Ireing  c<mtrolle<l  fFig.  4).  A 
separate  network  is  used  for  ear  h predicted  variable. 
Inputs  to  each  network  are  measured  process  variables 
and  trial  control  variables.  When  the  switch  is  in 
search  |K>siti<»n,  sequem-e  search  logic  ran  rapidly  in- 
terrogate the  networks  to  di.scover  predicted  conse- 
quemes  of  hypothetical  cruilrol  actions.  When  the 
switch  is  moved  to  control,  the  lirsl  sequence  of  con- 
trol action  found  by  interrogating  the  networks  is 


transmitted  through  at'tiiators  to  the  process  being  con- 
trolled. 

The  se<(uenre  may  Iw  rec/»mt»ulcd  as  often  as  «le- 
sired  most  (dten  for  a process  that  is  suhjt*cl  to  fre- 
quent disturham-es.  ('onversely.  the  more  accurate  the 
predictive  model,  the  less  frequently  the  system  must 
reealoulate  its  rontrol  decisions. 

.'sectueme  "♦•arch  logic  is  a numerical  optimization 
algorithm  v\liose  input  is  a pre<licted  score  rrmiputed 
by  i>erformanre  ass#*ssment  logic.  Hiis  sc<»re  may  con- 
sist simply  of  the  magnitude  of  the  arithmetic  differ- 
ence between  the  desired  final  value  of  a prmess 
variable  aixl  it<  predicted  final  value,  in  which  in- 
stance the  goal  of  the  search  logic  is  to  bml  a se- 
(luerxe  that  iliives  the  s<oie  to  zero.  When  nuil!il»le 
varial»les  are  to  In*  conlrotle<f  to  stH*ci(ie<l  final  values, 
the  score  function  mav  1h*  a weighted  sum  of  pre- 
dicted absolute  final  errors,  in  which  the  more  im- 
portant tiiiai  vatiahles  are  given  greater  numerical 
weights  than  the  less  important  variables.  Other,  more 
sophislH  ated  score  fiinrtions  riiav  |>e  computed  to  ex- 
press sm  h « haractei  istics  as  quality  of  steady  state 
jH*rformanre,  transient  rrx’overy  from  disturliances,  or 
ariherence  to  manufacturing  constraints.  If  flexible 
search  logic  such  as  a guided  random  search  algorithm 
is  (ise<i,  virtually  any  computable  m f>re  function  may 
he  enqitoved  to  govern  the  searr  h. 

While  the  switch  is  in  search  mode,  the  learning 
networks  are  interrogated  at  a very  high  rate,  making 
it  possible  to  try  many  hypothetical  sequences  within 
a short  period  of  time.  I yjiically,  for  software  networks 
having  approximately  50  elements,  at  least  lifO  interroga- 
tions s ran  l>e  realized.  Aihieving  a high  interrogation 
rale  requires  that  the  networks  Im*  efficiently  programmed. 
For  the  most  demanding  applications,  |ieripherul  com- 
puter hardware  may  also  he  employed  to  implement 
the  networks.  ]>et»ending  on  the  amount  of  parallel 
circuitry  use<l  in  the  |>eripheral  hardware,  a Irnfold-lo- 
thousandfold  increase  in  s|>eed  may  l>e  realized. 
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Fig.  4 An  application  in 
computer-aided  m a n u f a c* 
turing.  Key  eien>ent8.  shown 
in  color,  set  up  an  optimum 
seguence  of  control  actions 
in  search  mode,  then  ap- 
ply these  actions  to  the 
process  when  the  switch 
is  thrown  to  control  mode 
If  necessary,  the  system 
can  return  to  training  mode 
for  minor  retraining  from 
time  to  time 


i. 


I 


L 


\Xhen  the  predicted  score  is  satisfactory,  perfor- 
mance-assessment logic  switches  the  control  system  from 
search  to  control  mode  (the  system  is  in  control  mode 
most  of  the  time),  at  which  point  adaptation  or  retrain- 
ing of  the  predictive  networks  may  be  possible,  using 
some  of  the  system’s  computer  resources.  Adaptation  is 
a fine  tuning  of  network  coefficient  values,  keeping 
the  connectivity  structure  fixed,  perhaps  using  a gradi- 
ent or  guided  random  search  algorithm.  Retraining  com- 
pletely restructures  the  networks  and  recalculates  the 
coeffinents  to  modify  the  ways  in  which  network  input 
variables  interact  within  the  networks  themselves. 
Adaptation  alone  is  usually  sufficient,  adjusting  the 
coefficients  in  a background  mode  of  control  computer 
utilixation-  that  is,  with  a lower  priority  than  the 
control  function,  carried  out  only  when  the  computer 
is  briefly  idle.  Sometimes,  adaptation  is  needed  so  in- 
frequently that  it  can  be  performed  entirely  offline 
at,  say,  monthly  intervals. 

Sensor  verification  logic — an  important  part  of  the 
learning  network  system-  -establishes  that  each  set  of 
proceia  measurements  submitted  to  the  networks  is 
reasonably  similar  to  the  data  patterns  with  which  the 
networks  were  trained.  If  this  similarity  is  not  present, 
the  process  or  sensor  may  be  malfunctioning,  or  per- 
haps the  system  requires  further  training.  Programs 
have  been  developer!  that  (perform  the  similarity  test 
automatically  and  deal  with  each  possibility  that  ran 
ariae;  these  programs  use  a data-rlustering  algorithm 
to  synthesize  the  test. 

Lvoniinf  N*tw»rii  Eiampl* 

Control  of  runout  table  cooling  sprays  in  a hot- 
atrip  rteel-finishing  mill  has  been  achieved,  ahhough 
this  ia  a difficult  industrial  process  for  which  to  build 
a mathematical  model  difficult  because  it  hat  many 
ffroeem  variables,  strongly  nonlinesr  interactions  occur 


between  some  of  these  variables,  and  the  response  to 
a change  in  any  input  variable  is  delayed  by  a length 
of  time  related  to  the  velocity  of  the  .steel  strip  mov- 
ing through  the  mill  and  the  physical  length  of  the 
mill  itself.  Yet  modeling  such  a process  is  extremely 
imtHirtanI,  herause  it  offers  the  best  way  to  reduce 
or  eliminate  substantial  economic  losses  suffered  when 
either  manual  or  conventional  automatic  controls  are 
applied. 

Learning  networks  proside  a new  method  of  model- 
ing this  process.  TTiey  provide  accurate,  readily  com- 
putable relationships  with  which — as  the  values  of 
input  and  control  variables  are  changed— outputs  can 
l*e  pre<licled.  Because  they  take  time  delays  into  ac- 
count, they  are  not  sluggish  as  are  linear  controllers 
for  transport  delay  plants,  and  they  therefore  avoid 
production  of  much  off-specification  material  resulting 
from  the  large  amount  of  time  that  such  controllers 
require  to  reach  their  final  states  after  input  changes. 
Also,  unlike  some  linear  controllers  that  are  too  close- 
ly coupled,  they  do  not  l>ecome  unstable. 

Since  it  does  not  require  that  major  nonlinear  in- 
teractions he  precisely  known  in  advance,  the  learning 
network  meth<Ml  is  more  successful  than  previously 
applied  linear  regression  techniques.  Also,  unlike 
classical  regression  techniques,  it  is  unlikely  to  go 
astray  with  real  process  data  after  having  t>een  made 
to  work  well  with  a different  set  of  test  data- -be- 
cause overfitting  is  avoided. 

A typical  hot-strip  finishing  mill  consists  of  six  roll 
stands,  a runout  table  several  hundred  feet  long  with 
water  sprays  both  above  and  below  it,  and  a coiling 
device  at  its  end.  A steel  bar  about  1 in.  thick  and 
heated  to  approximately  1900®F  enters  the  first  roll 
stand,  which  squeezes  it  into  a somewhat  thinner  and 
wider  bar  that  is  moving  substantially  faster  than  when 
it  entered  the  stand  (part  of  the  steel  displaced  by 
squeezing  goes  sideways,  part  of  it  moves  forward  to 
contribute  to  the  velocity  of  the  bar).  The  other  roll 
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Conclusion 


MamN  rt'iH'at  llu-  proci-^*-  in  sjir«rssion.  so  ll»at  tin- 
output  strip  i-*  prilMp"  tVl  in.  llu«*k.  rool<-<i  to  I- 

rl>iit  -'till  umI  liot » aiul  inoviirn  at  iit»  to  2<MMt  fi^inin. 
A-*  it  iiioNt'v  down  tin-  iiinoul  tahlr-,  tin-  strip  must  In* 
cmdr-tl  I*V  tin-  spj.iys  to  a sp«-<itir  l«!n|K-i atiin-  lu-for»- 
it  is  rollrd. 

Kor  i!lustrali\f  purposes,  ratln*i  tlian  sjjoVNinn  how  lln- 
im-thod  iiMN  In-  ap|di»-d  to  an  «-ntirn  rolling  mill,  wp 
diM'Uss  In-it-  ordN  its  appliialion  to  ranilrol  pressuriz«-<i 
wal«-r  spra\s  us«*d  t<»  <’ool  tin-  hot  >t»-r*l  strip  afit-r  it  has 
pasM-rl  tin-  last  roll  stand.  ( I'ln-sr-  sprays  prr-pare  llu- 
stnt*  h*r  r oiling.  I \ l>piral  mill  r rmtains  many  such 
spra\s.  iNpirallv  arrariKcrl  in  ahmil  la  discretely  roti- 
trollahlc  sptas  hanks:  th<-y  alT<-«t  sc\i-ral  parameters  in 
tin-  strip  in-irig  ptmluced.  which  can  take  several  secornis 
to  p.rss  Irrun  the  last  rrdi  siainl  to  tin-  crriling  rievice. 

In  this  e\amt»lr*.  tlie  nnnh-l,  which  is  [trogrammed  cm 
an  lll'l  I'HMt  proces<«  contird  r-oinpuler.  predicts  the  niim- 
In-r  ni  spr.ivs  reipiireri  to  arhieve  the  desired  roiling 
lernin-raturr-.  It  has  scveri  inputs:  r-rdling  anri  finishing 
tenitn-ralures ; the  strip's  s{»<-ei).  thirkness,  width.  anrI 
harriness:  and  spray  juessuie.  (Kinishing  l«-niperature 

is  at  tin-  last  rrdl  starnl:  the  rlilTereine  In-tween  the  two 
lem|n-iatur»-s  rr-pn-s«-nls  the  In-at  that  m-.ist  In-  removed 
h\  tin-  sprays.  I ttul(ruts  are  mimU-r  arni  configuration 
id  the  spravs,  chos«-n  from  eight  aho\e  the  table  and 
s<-ven  iiinierriealh. 

Ifeartion  id  the  bar  to  the  successive  rolling  steps  is 
initially  assumed  and  the  spravs  are  preset  when  the 
har  enters  the  t»fst  t<$}}  st.-md,  ctrvorfUnfs  to  a />redjrlir>n 
based  on  this  assumption.  \s  the  partially  rrdled  strip 
emerges  from  the  fourth  stand,  llie  predicted  sprays  are 
actually  turned  on:  lliey  take  a few  .seconds  to  build  up 
to  their  full  volume,  while  the  stri|>  travers<*s  the  last  two 
stands.  When  tlie  strip  emerges  from  the  sixth  and  last 
stand,  the  prediction  Is  made  again  on  the  Irasis  of 
actual  measured  tem|M-ralure  and  s|»eed  <d  the  strip. 
If  tin-  assumed  In-havior  of  the  strip  was  valid,  tin- 
original  prednlion  would  have  he<-n  correct  and  the 
pro|wr  spravs  would  now  In*  oin-rating;  hut  if  the 
measured  variables  depart  in  any  way  from  their  assiime<| 
values,  the  number  of  sprays  is  modified  aeconlingly. 

l*lie  model  also  predicts  the  tr-m|H*rature  of  the  coolerl 
strip  as  it  In-gins  roiling.  If  the  artual  tem|n-ralure  at 
this  point  differs  markedly  from  that  predicted,  the 
model  may  neerl  morlihcatirm  nr  further  training,  or  tin- 
process  sensors  or  aetuators  may  not  in*  working  |)rop* 
erly.  In  pracliee,  'Wl' J of  a sample  run  of  hi  2 coils  fell 
within  an  acceptable  toh-ta.ice  of  coi  resjMmding 

to  a predietion  error  within  *^2  sprays.  Kurthermore. 
‘>3.5  '/<  of  the  coils  were  within  *1  spray  of  the  ideal 
setting  sufistanliallv  ln*tter  tlian  the  |>rrformance  of 
conventional  e»mtn»IIers.  es|>eciatlv  when  product  s|nH'ih. 
cations  are  fmpi  -ntiv  ehanged.  ( Iln*  learning  network 
model  that  priniiiced  this  )n*rformance  was  synihesi/ed 
from  building  blocks  according  to  tin-  procediife  (»ullirn*d 
earlier  I . 

Other  areas  where  learning  networks  have  ln*en  ap- 
plied or  are  ciirrenllv  ln*ing  develo|n-d  for  a variety  of 
(^AM  pnn-esses  include  inferener  «»f  surface  finish  rough- 
ness in  machining,  ultrasonic  nondestructive  irsling,  fer 
mentation  process  modeling,  crystallization  process  mo<|el- 
ing.  and  modeling  of  casting  ipialilv  in  aluminum  tlie 
casting. 


|{esiilt^  for  control  of  the  cooling  sptavs  on  tin-  nin«>ut 
tahit-  of  a hot-strip  stet-Minishing  mill  verifv  the  ulilitv 
of  iht-  learning  lu-twork  approat  fi.  even  with  t onvcntional 
sensors.  Since  networks  can  Im-  trained  t<»  make  assjai 
ations  that  ordinary  inodt-ling  proi  eduies  cannot  niaki . 
noncorivenlional  sensors  might  he  u»eful  <*r  even  ev«.rrilial 
in  other  ap|dications,  demonstrating  lh<-  power  of  (lie 
)(-arriing  network  approach  even  more  tiramatK  ally. 

Software  realizations  of  learning  networks  |irovi<le 
adetpiate  computing  s|N*e<l  in  many  silualiori".  For  fu- 
ture applications  requiring  faster  inferences  or  pn-dic 
lions,  large-scale  integrated  mierocirculls  art-  being 
developed.  These  will  lead  to  very  flexil)!*-  anil  pt»wer- 
fill  peripheral  devices  for  the  central  processors  to 
which  they  are  attached. 

.Although  (1AM  applications  of  learning  networks 
have  been  emphasiz.efi  here,  (l.Af)  applications  ate  abo 
receiving  attention.  Whereas  in  (1\M  the  netwoik  is 
trained  to  a high  degree  of  acniraey  before  being  usetl 
online,  in  (I  M)  it  learns  with  eat  h suci  essivc  design 
experiment,  and  with  search  logic  it  performs  as  a 
designer  in  seeking  out  the  best  tiesign. 
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INTRODUCTION 


Mucciardi  [1]  has  developed  several  procedures  for  aug- 
menting cluster  classifiers  with  trainable,  polynomial  networks. 
The  networks  transform  the  x-input  data  into  a y-domain 
where  the  classes  are  more  easily  and  accurately  clustered. 

The  classifier  proposed  here  employs  the  same  concept  of  trans- 
forming the  X data  into  a "more  clusterable"  y domain.  With 
the  Mucciardi  approach,  at  least  one  network  is  required  for 
each  class,  so  in  many  class  problems,  the  number  of  networks 
is  large.  An  attempt  is  made  here  to  reduce  the  number  of 
nets  required  to  perform  an  adequate  transformation,  so 
c^ass i f i cat  ion  problems  involving  many  classes,  e.g.,  twenty- 
five  or  more,  become  more  tractable. 


A NETV/ORK/Ct  USTE R CLASSIFIER 


THE  PURE  CLUSTER  CLASSIFIER 

When  the  cluster  algorithm  is  used  by  itself,  i.e., 
witli  no  network  augmentation,  as  a classification  tool 
it  is  "trained"  as  follows:  The  training  data  (the  X 

vectors)  for  the  K classes  are  separated  according  to  class. 

For  each  class,  the  cluster  algorithm  constructs  hyperellipses, 
defined  by  means  and  standard  deviations  0|^^,  of  data  which 

cluster  closely.  The  subscript  k denotes  class  and  n denotes 
the  component  of  the  X vector.  The  locations  and  dimensions 
of  these  hyperellipses,  a’ong  with  the  associated  classes, 
comprise  the  information  used  by  the  classifier  in  the  classi- 
fication mode . 

A block  diagram  of  the  cluster  classifier  is  shown 
in  Figure  1.  Its  operat i on , g i ven  an  input  vector  X,  is 
to  generate  a normalized  distance  measure  indicating  how  far 
the  vector  X lies  from  each  ellipse 


{ 

I 


1 


N 


n=  1 


(1) 


The  subscript  i denotes  the  i th  cluster  within  the  kth  class. 
(Table  1 gives  a definition  of  the  indices  used  in  this 
note . ) 


The  unnormalized  probability  of  belonging  to  that 
hypere 1 1 i psc  I s 
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Table  1:  Definition  of  Indices 


i 

I 

j 

K 

k 

L 

I 

M 

m 

N 

n 


subscript  indicating  ellipse  number 

total  number  of  ellipses  per  class 

subscript  indicating  class  number 

total  number  of  classes 

subscript  indicating  class  number 

total  components  in  the  y vector 

subscript  indicating  y vector  component 

total  number  of  entries  (in  the  training 
set ) per  class 

subscript  indicating  entry  number 
total  components  in  the  x vector 
subscript  Indicating  x vector  component 


and  the  normalized  probability  is 


Pki 


ki 


^ L P'ki 


k=  1 i = 1 


(3) 


The  probability  of  belonging  to  class  k is  the  sum  of  all 
the  class  k clusters. 

I 

^1  = y Pi  ■ 

k Z_.  k I 
i = l 

The  class  probabilities  p form  a class  p robab  i I i ty-s  tate -vector 
(PSV)  . 

Several  problemis  may  occur  using  the  above  class  if  ica- 
t i on  rou  tine: 


I 


\ 


I 


1 


1.)  If  any  of  the  variables  in  the  X vector  is 
totally  random,  i.e.,  independent  of  the  true  classi- 
fication information,  several  X data  points  (within  a class) 
which  would  otherwise  cluster,  will  be  artificially  separated 
in  hypet space  due  to  variation  on  the  random  variable. 

K;<cess  clusters  are  introduced  to  handle  separated  data, 
so  more  complex  classification  logic  is  required.  Further, 
new  data  with  values  of  the  random  variable  which  do  not 
fall  within  the  training  set  regions  will  not  necessarily  be 
classified  properly.  An  example  is  shown  in  Figure  2. 


2.)  If  various  X data  points  (within  a class)  do 
not  fall  within  elliptically  shaped  regions,  several 
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ellipses  must  be  used  to  approximate  the  region.  This  in- 
creases the  complexity  of  the  cluster  classifier,  often 
introducing  large  numbers  of  clusters  per  class.  An  example 
i s s hown  in  F i gu  re  3 . 

3)  Since  the  clusters  for  each  class  are  configured 
without  reference  to  the  cluster  structure  of  the  other 
classes,  clusters  for  two  different  classes  may  overlap 
even  though  the  x vectors  from  the  two  classes  are  quite 
separable.  An  example  is  shown  in  Figure  4, 

4)  A problem  arises  with  the  use  of  cluster  algorithm 

which  generate  ellipses  without  tilt  (i.e.,  the  variance 
terms  are  allowed  to  vary  but  the  iiQvariance  terms  are 
restricted  to  zero).  Data  which  may  cluster  nicely  with  the 
use  of  tilted  ellipses  require  the  use  of  several  smaller 
ellipses  or,  if  one  ellipse  is  used,  it  may  subsume  more 
space  than  the  data  legitimately  occupy.  An  example  is 
shown  in  Figure  5.  (Note:  a problem  with  a cluster 

algorithm  which  permits  "tilting"  is  that  the  number  of 
^variance  terms  per  ellipse  is  N(N  - 1 )/2  while  the  number 
of  variance  terms  is  only  N.  Thus  the  memory  and  computa- 
tional requirements  increase  very  rapidly  with  increasing 

d imens i ona 1 i ty , ) 
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A NETWORK/CLUSTER  CLASSIFIER 


To  overcome  the  above  mentioned  difficulties  in  high- 
d i mens i ona I , many  c I ass  c I ass i f i ca t ion , it  is  proposed  that 
the  cluster  classifier  of  Figure  1 be  modified  as  shown  in 
Figure  6.  a group  of  L networks,  (N, , N2 , . . . , ) is  inserted 
between  the  X input  and  the  distance  measurement  function, 
and  there  is  only  one  hype  re  1 1 i pse -d i s t ance -de tec t i on  per  class. 
The  purpose  of  the  networks  is: 

a)  To  reduce  the  d i mens i ona 1 i t y of  the  cluster  space, 
i.e.,  to  transform  the  N dimensional  X'vector  space 
down  to  an  L - d i mens  i ona  1 (L<<'N)  space,  and 

b)  To  transform  X in  a nonlinear  fashion  so  that 

i)  the  Y vector  group  in  a single  cluster  per  class, 

ii)  each  of  the  class  clusters  may  be  adequately 
represented  by  a non-tilted  ellipse,  and 

iii)  the  class  clusters  are  well  separated  in  the 
Y space. 

By  performing  this  transformation,  cluster  discrimination  is 
simpler  ( dur  to  lower  dimensionality  and  the  exi stance  of 
only  one  cluster  per  class)  and  the  problem  of  cluster  overlap 
has  been  reduced. 


OPERATION  OF  THE  NETVIORK/CLUSTER  CLASSIFIER 


In  its  classification  mode  the  network/cluster  class! 
fier  operates  according  to  the  following  procedure: 


1)  Conipute  the  Y vector  from  the  X vector  with  the 
use  of  network  transformation.  Each  component 
y^  is  a polynomial  function  of  the  X vector. 


2)  Compute  the  normalized  distance  measure,  in  y 
space,  to  the  K class  clusters: 


"k  = 


L 

L 

1^1 


(u 


kt 


(5) 
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Figure  6:  Network/Cl 


The  y data  come  from  step  1.  The  class  mean  and  variance 
c 2 

ki  from  the  training  process,  explained 

later . 

i)  Compute  the  unnormalized  class  probabilities: 


exp(  -d 


? 

k 


) 


(6) 


This  computation  assumes  that  the  class  probability  - 
dens i ty - f unct i ons  are  Gaussian  in  y space.  It  must  there- 
fore be  an  objective  of  the  training  function  to  generate 
networks  which  map  the  training  data  (which  may  be  irregularly 
spaced  in  the  x domain)  into  Gaussian  distributions  in 
the  y domain. 

4)  Compute  the  normalized  class  probabilities: 


k=1 


(7) 


The  last  step  produces  the  desired  classification 
probabi 1 i ty-state-vector , 

NETWORK/CLUSTER  CLASSIFIER  TRAINING 

It  is  the  objective  of  the  training  function  to 
develop  networks  which  map  data  in  the  x domain  to  the  y 
domai n where  : 
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a)  all  the  data  from  a given  class  k form  a single 
Gaussian  distribution  in  y space,  and 

b)  the  K class  distributions  in  y space  are  sufficiently 
separated  that  inter-class  ambiguities  are  minimised. 


The  classifier  synthesis  technique  is  to  select  a set 
of  L polynomial  networks  with  "sufficiently  rich"  generality 
that  an  "adequate"  transformation  from  the  x domain  to  the  y 
domain  can  be  achieved  if  the  proper  coefficients  are  selected. 
The  coefficients  are  selected  using  a GARS  algorithm.  The 
performance  measure  for  the  search  is  mathematically  complex, 
but  it  simply  measures  how  well  the  networks  are  clustering  the 
within-class  data  and  separating  the  different-class  clusters. 


Within  each  class,  the  means  and  variances  of  the 
data  in  the  y domain  are 


M 

" M Z 

m=  1 


(8) 
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The  "separation",  which  is  a unitless  measure,  of  class  j 

2 

from  class  k is  given  by  the  Mahalanobis  D metric:  the  squared 
differences  of  the  means  normalised  by  the  "spreads"  or 
variances  of  the  class  data  around  their  respective  means 
(summed  over  all  L dimensions): 
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The  aggregate  separation  S is  given  by  the  "parallel"  combination 
of  a 1 1 the  inter  cluster  separations; 


_ K (K-1) 
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k=l  j=k+1  jk 


(11) 
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The  parallel  combination  of  d places  maximum  performance 
benefit  on  increasing  the  smallest  inter-class  separations,  v/ith 
minimum  emphasis  on  increasing  separations  which  are  already  large. 


The  performance  assessment  (PA)  function  for  the  GARS 
algorithm  is  to  maximize  the  aggregate  separation  S. 


Note  that  the  PA  function  places  no  constraints  on  the 
location  of  clusters  in  y space.  Any  network  solution  for 
which  the  ratio  of  the  cell  sizes  to  the  intercell  distances 
(see  Equation  10)  is  small  is  sufficient  for  unambigious  classi- 
f i cat i on . 


Once  a sufficient  value  for  S has  been  obtained  the 
values  of  cluster  means  and  variances  are  obtained  directly 
from  Equations  8 and  9.  These  values  are  used  when  the 
classifier  is  used  in  the  classification  mode.  A matrix 
representation  of  these  data  is  shown  in  Figure  6a. 
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Also  of  interest  is  the  separation  matrix,  shown  in 
Figure  6b,  which  gives  the  interclass  separation  (from  Equation 
10)  for  all  classes.  This  indicates  which  classes  are  likely 
to  be  confused  and  which  are  not. 
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Ahst  ract 

A hardware  approach  has  been  developed  for  the 
reallz''»’on  of  electronically  progr.unmable 
array.s  or  use  in  a variety  of  signal-pro- 
cessing  functions.  Such  arrays  comprise  a network 
of  elements  whose  input-output  relationship,  inter- 
connectivltv  and  constants  may  all  be  controlled 
or  varied  to  solve  particular  problems. 

Each  element  will  be  given  the  capability  of 
evaluating  .any  one  of  five  polynomial  expressions 
ranging  from  a linear  combination  of  five  vari- 
able inputs  to  the  complete  multinomlnal 
of  two  variables.  The  basic  numerical  format 
for  .ill  Inputs,  outputs  and  weights  may  be 
designated  as  fixed  or  floating  point,  with  up  to 
a 24-bit  mantissa  and  8-blt  exponent. 

It  is  shown  how  such  aii  element  can  be  multiplexed 
to  simulate  layers  and  then  nets,  or  how  a whole 
layer  or  net  can  be  populated.  The  latter  con- 
figurations provide  higher  throughput  and  reliabil- 
ity at  the  expense  of  arl thme cl c-cl rcui I complexity. 
Factors  of  up  to  1000  to  1 in  speed  improvement 
are  realised  with  the  reported  hardware  approach 
when  compared  to  using  general-purpose  computers. 


INTRODUCTION 

It  has  been  shown  that  arrays  of  calculating 
elements,  providing  various  functions  of  their 
Input  variables,  can  be  used  as  general -purpose 
signal  processors.^”'*'  Such  elements  could,  in 
gent^ral,  operate  on  binary,  analog,  or  numerical 
Inputs.  Networks  or  arrays  composed  of  elements 
In  each  of  these  domains  have  advantages  in 
particular  applications.  The  numerical  processing 
element  and  Its  use  in  progracmab le  arrays  is  the 
subject  of  this  paper. 

The  major  potentials  of  such  arrays  lie  in  their 
ability  to  perform  rell.-ible  high-speed  processing, 
and  their  ability  to  be  used  in  adaptive  control 
systems. 

Hie  function  rcpetolre  of  the  basic  elenx'nt  has 
been  defined  as  a family  containing  both  linear 
and  non-linear  multinomial  expressions.  The 
particular  functi/ai  of  the  element  at  a given 
timi^.  Its  Inter-connectlvlty  within  an  array, 

A^KNGWUPGMENTS  (Page  R) 


tind  the  coefficients  or  weights  of  its  multi- 
nomial terms  are  all  controllable.  These  featur**s 
provide  an  array  of  sutli  elements  with  a high 
degree  of  flexibility,  and  allcn^s  such  an  array 
to  be  alternately  programmed  to  solve  a large 
variety  of  problems  Also,  such  an  array  can  be 
’’trained"  in  that  It  can  r.ipidly  accept  different 
sets  of  weights  and  connection  commands  until  It 
represents  a transformation  acceptable  as  a 
solution  to  a given  problem.  Although  limited 
versions  of  such  arrays  have  been  built  in  hard- 
ware they  have  been  simulated,  for  the  most  part. 

In  software.  This  has  confined  the  .ipplicatlon 
of  such  arrays  to  off-line  processing  or 
experimental  work. 

With  the  objective  of  efficient  array  realization, 
an  analysis  of  programmable  array  hardware  require- 
ments has  been  made,  tradeoffs  In  its  Implnmentation 
have  been  convicted,  and  an  optimum  architecture 
has  been  determined.  Based  on  state-of-the-art 
LSI  technology,  projections  as  to  array  performance 
were  derived.  Tliis  derivation  led,  in  turn,  to 
the  definition  of  a custom  OlOS/SOS  LSI  circuit 
which  would  serve  as  a Key  ingredient  in  the 
processor  of  the  element.  Tills  circuit  and  its 
use  in  progranunable  arrays  will  be  described. 


PROCESSING  ELEMENT  AND  ARRAY  STRUCTURE 


A basic  element  is  depicted  in  Fig.  1(a);  the 
figure  shows  its  major  control  and  input  and 
output  signals.  Such  elements  are,  in  general, 
structured  In  mult-layered  nets,  as  shown  in 
Fig.  1(b).  The  interconnection  control,  which 
is  part  of  each  element,  hut  which  appears 
functionally  between  successive  layers  of  elements, 
has  Inputs  which  can  route  the  output  signals  of 
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Fig.  1 - Programmah le-  tvmct Ion  array  structure. 
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.)i;r  t*lenu»nt  to  the  «u*siii*d  input  of  an  ehtuent 
In  th*  ”0x1  iavof.  Fach  ol.^DOPt  and  Interconnect 
fo  tru  array  hon  sufficient  nK*nK>ry  for 
storin^i  the  lunctxvm  tvpc  U is  to  compute,  the 
n«  . \ w»  t«liLs  iM  . ot  t f ioi  t*ni  s of  Its  function, 

.1!  tin-  lulerconnoct  data.  Hence,  this  pronrani- 
ra.iMe  airiv  cai»  be  considered  as  a distributed 
lo^lc-”Cu*i:x''rv  processoi  capable  ot  rapid  re- 
conf  1 p,urat  ion  ami  calculation  of  complex 
t r.:n;.t.  in’  ttions. 

K^inre  describing  the  elenent  requirements  and 
drcbltociure,  the  three  possible  configurations 
of  pr«'graainablc  arrays  will  be  explained;  these 
configuration!',  irc  shown  in  Fig.  2,  In  Fig.  2(a), 

.1  not  of  M Dins  ion  1 x k Is  sliovnj  fully  populated, 
with  o.uh  “!eo>»nt  containing  an  m -bit  store  for 
'uTc*‘.a.iry  function  and  control,  lliis  configuration 
• .tn  process  for  transfona)  j/2  inputs  at  a time 
.me  c.uj  be  '.perated  In  a pipeline  mode  so  that  k 
h.  . .1  data  re  simultaneously  operated  upon  in 

Jlticrent  layers. 

lT»  2(h’i,  a single  layer  of  elements  can  be 

luul  t i Dlexfd  to  simulate  a whole  net.  The  memory 
- f.ich  t Jem»»nt  must  now  have  km  bits  to  provide 
!•  n'cess^rs’  control  as  the  processor  of  the 
I'liBte  r a ts,  in  turn,  tu  realize  eaclx  of  the  k 

For  any  single  problem,  the  speed  is  the 
^,um  .IS  In  Fig,  2(a),  but  pipeling  can  not  be 

in  Fig.  2(c)  a single  processing  element 
with  jkm  bits  of  storage  can  be  used  to  process 
inputs  within  a layer  and  then  successive  layers. 
rrovi!>lon8  must  also  be  made  for  storage  of 
i tuc rmed i ate  results  in  Pigs.  2(b)  and  (c); 

.«ppr>4ch  (c)  will  have  l/j  the  speed  of  (b)  . 

fUU  NCT  iMAX  rnROiiGMPUT 


U Is  envisioned  thaf  arr.ays  <m  up  to  25fc  elecm-utB 
(or  equivalent)  could  provide  the  processing 
capacity  for  a vast  majority  of  applications. 
Hence,  ttie  design,  using  either  the  conf iguration 
in  Fig.  2(h)  oi  (c)t  will  hive  the  capacity  for 
memory  expansion  ti-  simulate  nets  of  up  to  256 
eletwtnts.  These  airays  will,  in  general,  be 
loaded  and  controlled  by  a Cf^nputer,  but  may  have 
Input/output  slgnil  interfaces  as  well,  as  sliown 
in  Fig.  3.  As  smh,  the  progranenable  arravs  can 
be  considered  .^s  tirrawar*  , reconf igurable  upon 
coninand,  for  high-speed  processing. 
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Fig.  3 - Progrannnable  array  environment. 

KLEMENl  DEFINITION  AND  UESCKIPTION 

Tlie  basic  numerical  processing  element  of  the 
progranauable  array  will  have  the  capability  to 
perform  the  following  functions: 


PI 

y 

W +W,  X, 
oil 

P2 

y 

W fW,X,+W.,X.,+W-X,X_ 
o 11  22  312 

P3 

y 

W fW,X,+W.X.+W-X,X_+W,x/+W, 
o 1 1 2 2 3 1 2 A 1 

PA 

y 

W X^-X,  X.,2 
o 2 12 

P5 

y 

W +W,X,+W-X.,  ^W,X.,+W,  X, +W'  X. 
o 11  22  33  AA  55 

I 5Wi’'m  ' 


r^Touc"] 

fun  lATtP  (MiN  OttAn 


SINGtC  CLtMCNT  (MlN  HAROWARtl 


jum  01  rs 


Fig.  2 • Possible  array  conf Iguratioua . 


Functions  PI,  P2,  and  P3  are  useful  for  general- 
purpose  linear  and  non-linear  hyper-surface  cal- 
culations or  transformations.  PA  was  provided  for 
realizing  division  by  a recursion  tormula.  P5, 
a linear  combination  of  five  variables,  is  ideally 
suited  for  digital  filters  and  linear  transforms, 
such  as  the  FFT . Each  of  the  five  functions  could 
be  realized  by  specifying  only  one  element.  For 
example,  PI  could  be  tised  twice  to  realize  P2, 
etc.  However,  substantially  greater  speed  and 
efficiency  Is  achie'^ed  by  providing  a micro- 
programmed control  to  optimally  evaluate  each 
function . 

The  element  will  be  given  tlie  capability  ot 
operating  on  32-bit  floating-point  values  with  a 
2A-bit  mantissa  and  8-blt  exponent.  It  will  also 
be  capable  of  operating  on  fixed-point  values  of 
up  to  2A-bits  with  Increased  speed  and  lower 
package  count . 

Hie  basic  element  is  shown  in  Fig.  A;  it  consists 
of  an  arithmetic  processor,  an  element  controller, 
and  random-access  memories  (RAM's) . 

Tlie  arithmetic  processor  performs  additions,  sub- 
tractions, and  multiplications  of  the  Input 
variables  and  weights  to  coiq^ute  the  various  poly- 
nomials. Hiese  operations  may  be  either  fixed  or 
floating  point,  depending  on  the  requirements  of 
the  array  to  be  synthesized.  Use  of  an  iteration 
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srhi'mt*  pennirs  division  lu  be  slauilaced  with 
several  elements. 


!)K?5ICN  ArPROACHKS 
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Fig.  4 - Basic  ele«Mit. 
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Tlie  eleiDent  controller  consists  ol  a read-only 
meoiory  (ROM)  containing  micro-instructions  for 
implementation  of  the  desired  polynomial 
repertoire  and  associated  logic  to  translate 
these  micro-instructions  into  control  signals 
for  the  arithmetic  processor  and  address 
information  for  the  RAM's. 

The  RAM's  contain  the  polynomial  function  selection 
and  element  Interconnection  information,  poly- 
nomial weights,  and  Inpul/output  variable  storage 
for  the  element.  Element  Interconnection  is 
accompl isiied  by  specifying  the  RAM  address  of  each 
required  Input  variable.  Output  variables  are 
stored  in  sequential  RAM  locations.  If  more  than 
one  element  is  used  to  simulate  an  array,  the 
input  variables  might  be  stored  in  the  RAM's 
associated  with  another  element.  In  this  case, 
one  or  more  Inte r-e lenient  buses  are  provided  to 
exchange  data  between  elements.  Since  any 
element  can  access  any  previously  generated  out- 
put in  this  manner,  complete  Interconnection 
flexibility  is  accomplished.  An  intra-element 
bus  provides  flexible  data  routing  within  the 
element.  In  operation,  a computer  "programs" 
each  element  by  loading  the  proper  polynomial 
select  codes,  input  addresses,  and  polynomial 
weights  into  the  RAM's  via  the  intra-element  bus. 
Itie  computer  then  loads  initial  input  variables 
into  the  variable  memory  via  the  intra-element 
bus,  and  provides  an  "execute"  signal  to  the 
element  controller. 

Each  element  controller  will  sequentially  step 
through  the  previously  ^rained  portions  of  Its 
associated  memory,  performing  the  desired  eleoient 
operations  and  storing  the  results  In  the  variable 
memory.  This  execute  phase  is  coi^>lete  when  the 
elements  detect  a polynomial  select  code  equivalent 
to  a halt  instruction.  At  this  point  the  element 
controllers  will  output  a "ready"  signal  to  the 
computer.  Upon  receipt  of  the  ready  signal,  the 
computer  retrieves  the  output  data  by  providing 
output  addresses  directly  to  the  element  control- 
lers and  receiving  the  output  data  via  the  Intra- 
element buses  . 


Given  the  preceding  c’peiallonal  ctsist  ral  i.t  s , v.li«it 
i.s  the  best  technology  and  architecture  tc  provide 
a good  tialance  between  speed  and  ccn^ilexlty  'i 
cost?  Tlie  fiwijor  hardware  consideration  imj).ict  ln,» 
these  criteria  is  the  multiplier  Implementation. 

Pwu  major  organizational  types  of  elements  wore 
investigated,  one  using  a high-speed  parallel 
multiplier  (with  and  without  pipelining)  and  ttn* 
other  a high-speed  se ri al /paralle ! multiplier. 
Estimates  as  to  total  chip  count  and  speed  wore 
made  for  each  approach  .and  for  variations  within 
each  approach.  Available  Lfil  and  KST  IC's  are 
considered  along  with  key  LSI  multiplier  chips 
which  would  have  to  be  developed  for  the  overall 
element  realization.  ' 

Tlu.  necessary  IC's  to  make  the  desired  program- 
mable arrays  a viable  processing  scheme  can  be 
realized  in  a variety  of  emerging  technologies. 
Very-high-speed  LSI  multipliers  have  been  developed 
or  are  in  development  in  both  bipolar  and  CMOS/SOS 
fonn.^^t^^  The  choice  of  IC  technology  for  each 
section  of  the  element  is  dictated  by  avallabllltv 
and  performance  of  existing  circuits  (particularly 
RAM's),  as  well  as  the  best  realization  of  any 
required  new  LSI  development.  These  factors  led 
to  the  definition  of  a CMOS/SOS  serial /parallel 
multiplier  chip  compatible  in  speed  and  logic  with 
associated  control  and  meoK:>ry.  The  serial  nature 
of  the  design  capitalizes  on  the  "on-chip  advantages" 
of  SOS,  and  the  lilgh  packing  denaltv  of  CMOS/SOS 
results  in  significant  package-count  savings. 

The  parallel  multiplier  approach  was  based  on  an 
expandable  8X8  CMOS/SOS  array.  Alternate  parallel- 
multiplier  approaches  were  considered  too  expensive 
or  nut  a good  match  with  the  rest  of  the  element. 
Although  there  are  faster  multipliers,  utlllzaticMi 
of  their  speed  to  achieve  substantially  higher 
element  throughput  would  require  more  elaborate 
control  logic  and  RAM's. 

For  highest  speed,  a 24  X 24  bit  parallel  array 
multiplier,  capable  of  pipeline  (alternately) 
referred  to  as  re-clocked  or  staged)  ope»^atlon 
can  be  used.  The  developmental  8X8  ('NOS/SOS 
multiplier  with  a 100  nanosecond  response  time 
can  optimally  form  the  building  block  of  such 
a multiplier  as  well  as  alternate  approaches  . ^ ^ ^ ^ 


A 2-stage  pipelined  multiplier  appears  optlmtim 
for  that  unit,  with  inteionediaie  storage  provided 
by  available  CMOS  registers.  Hie  effective 
utilization  of  such  a multiplier  depends  on  a 
considerable  number  ot  gates  for  bus  switching 
for  Its  access  from  the  RAM's  and  for  scaling  for 
floating-point  just  if Icat ion . 

The  serial/paral lei  multiplier  approach,  basically 
a I X 24  accunulatot,  overcomes  these  disadvantages 
ard  also  allows  for  the  realization  ot  eleoKnts 
over  a relatively  large  range  of  performance 
factors  by  means  of  multiplier  duplication.  On 
the  other  hand,  the  parallel-array  multiplier 
locks  the  design  into  relatively  high  speed  and 
cost . 
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Hiis  seri al /paral If  1 multiplier  approach  will  he 
described,  ;uid  pe rfcrmiince  factors  will  be 
suramarlzed  and  Ci^mpared  to  the  parallel  multiplier. 

LSI  SKRlAb  Ml?lTtPLlFR  TELL 

The  LSI  serial  multiplier  cell  currently  under 
deve lopim.*nt  Is  shown  in  Fi^.  5.  Tlie  cell  shown 
calculates  tf»Tn«  of  the  form  ax  * b,  where  a is 
the  multi plicand,  x is  the  multiplier,  and  b is 
the  addend.  The  cell  shown  can  handle  addends 
and  multlpliciinds  In  two 's-complemented  form  of 
anv  desired  length,  and  a multiplier  in  sign- 
magnitude  form  containing  a sign  bit  and  up  to 
23  signilicant  bits.  Cells  may  be  cascaded  lu 
form  higher-order  terms  ir  to  handle  larger 
mul t ipl iers . 

Multiplication  is  accomt llshed  In  the  23  adder/ 
latch  stages  by  successively  adding  the  contents 
of  the  multiplicand.  The  outputs  of  stages  7, 

13,  and  23  are  brought  out  so  th;^t  the  cell  may 
be  efficiently  used  with  8,  16,  or  24-bit 
multipliers  (the  first  bit  is  the  sign).  The 
output  of  stage  1 is  used  to  gain  immediate 
access  to  the  product  sign  bit  to  facilitate 
comp lement ing  operations  on  the  product. 


Tlic  Sign  Hold  lead  In  Klg.  3 inserts  additional 
sign  hits  into  the  two's  complement  mvil t Ipl i cand 
Input  (equivalent  to  inserting  zeros  in  front  of 
a sign -magnitude  number),  and  causes  the  most 
significant  product  bits  to  be  shifted  Ko  the 
output  . 

The  cell  can  perform  other  functions  in  addition 
to  the  basic  ax  + b operation.  By  placing  zeros 
in  the  multiplier  register,  the  cell  functions  as 
the  register  input . By  placing 
OlOOOOOOOOOOOOOOOnOOOOOO  in  the  multiplier 
register,  only  stage  1 will  function  as  an  adder 
while  the  other  stages  merely  shift,  causing  the 
cell  to  function  as  a serial  adder  and  register. 
Placing  a 1 on  the  Negate  Product  lead  results  in 
a serial  subtractor  and  register. 

The  developt^ental  cell  contains  the  equivalent  of 
450  2-input  logic  gates,  and  will  be  fabricated 
on  a chip  approximately  170  mils  square  by  means 
of  a double-epitaxial  si  1 i cx>n-on-«apph i re  CMOS 
process.  Kxpected  maximum  clock  rate  (based  on 
computer  simulations)  is  25  MHz.  The  cell 
utilizes  single-phase  clocks,  a single  power  supply, 
Is  static  in  operation,  and  may  be  mounted  in  a 
l6-pin  package . 


The  multiplier  register  is  divided  into  three 
seri al -in/parai lei -out  registers  to  provide  a 
good  compromise  between  multiplier  input  leads  and 
the  tl(M>  required  to  load  the  registers. 


The  two’s  complementer  will  complement  ‘he  multl- 
;li(and  if  the  multiplier  sign  bit  is  a 1.  This 
operation  r«'sults  in  a product  which  is  in  two's 
complemented  form,  as  demonstrated  in  Table  1. 
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FIXED  POINT  KLEMKNT 

Fig.  6 shows  ten  multiplier  cells  arranged  as  a 
processor  to  implement  the  general  second-order 
multinomial  (P3).  Cell  1 lunctlons  as  a 23- 
stage  delay  register,  cells  2 through  6 function 
as  multipliers,  cells  7 through  9 function  as 
adder/multipliers,  and  cell  10  functicxis  as  an 
adder/register.  All  six  terms  of  the  multi- 
nomial are  generated  in  parallel,  ;ind  delays  are 
matched  so  that  a single  clock  is  used  for  the 
entire  processor.  The  ones  complementer  is 
used  to  convert  the  processor  output  to  sign- 
magnitude  form  before  storage  in  the  X-Y  variable 
RAM.  Only  the  23  most  significant  bits  of  the 
output  are  stored.  (Truncation  of  the  least 
significant  bits  and  use  of  a ones  roa^lementer 
for  the  conversion  to  sign-magnitude  form  will 
result  in  plus  or  minus  one  LSB  error  in  the 
stored  output). 
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Fig.  5 - I.SI  serial-multiplier  cell. 
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To  implement  a general-purpose  arrays,  gating 
must  be  added  to  mute  data  from  element  to  element 
TT»e  element  rontrol  logic  can  then  control  the 
Inter-cell  data  flov  to  form  any  given  polynomial 
In  the  repertoire. 


Implementation  of  the  floating-point  element  Is 
shown  In  Fig.  8.  The  mantissa  processor  is 
slDilar  to  the  processor  of  Fig.  6 except  that 
provision  Is  made  for  scaling  of  the  terms 
before  addition  and  for  left-justification  of 
the  output . 


Fig.  6 - Serial  fixed-point  processor. 


Hie  exponent  processor  performs  the  following 

operation? • 


FLOATING-POINT  ELEMENT 

A floating^oir. t arithmetic  element  is  shown  in 
Fig.  7.  Floating-point  arithmetic  requires  hand- 
ling of  an  ex|>onent,  scaling  of  mantissas  so  that 
bits  of  equal  significance  are  added  together, 
and  left-justification  of  the  output  mantissa 
(with  corresponding  correction  of  the  output 
exponent)  to  retain  the  desired  floating-point 
format . 


a)  Calculates  the  exponent  for  each  term 
(e^)  aiid  stores  the  largest  in  the 

memory. 

b)  Calculates  (e  - ei)  for  each  term  and 
transmits  thli  scaling  information  to 
the  scalers. 

c)  Adds  the  “number  of  overflow  bits”  out- 
put of  the  overflow  position  detector  to 

e to  form  the  y-exponent  output  (corrected 
for  left-justification). 


Fig.  8 - Floating-point  processor. 
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n^«*  <<crtlers  dcl.iv  ih«  clocks  to  cells  I through  3 
to  line  tip  the  mantissas  of  the  first  order  terms 
v.lth  those  ot  the  second  order  terms.  K.ich  scaler 
then  provides  (e^  - e^)  additional  clock  pluses. 

K ich  additional  pulse  causes  that  term  to  be 
shifted  right  one  position  and  decreases  the 
>5  gnlf  Ic.ince  of  the  bits  in  th.it  term  hy  a factor 
■ *t  two.  In  this  way  the  terms  ire  shifted 
relative  fo  each  other,  .so  that  bits  of  equal 
s ignl  f 1 c.tnce  .appear  at  the  w(’ rd-^aral  le I adder 
Inputs. 

The  overflttw  problem  can  be  understood  by  assuming 
a binary  point  after  the  most  significant  bit  of 
e.ich  mantissa.  All  mantissas  are  then  equivalent 
t''  numbers  betwe-en  decimal  1 and  decimal  2.  For 
the  gcncT.cl  second  order  multinomial,  It  can  he 
shewn  that  the  m.ixltnuro  is  decimal  34.  Ilils  range 
'“orresponds  to  a variation  of  six  binary  positions 
in  tlw  location  of  the  output  m.inrissa  MSB. 

The  output  mantissa  is  shifted  until  the  le.ist 
,-ii gni  t leant  possible  location  of  the  .M.SB  is 
located  In  cell  10  with  the  remaining  possible 
’■overflew”  locations  In  the  overflow  position 
detector.  This  detector  uses  ct'nrf>lriati»ri.il 
logic  to  determine  the  MSB  position  and  tt- 
gener.ite  a binary  runner  representing  this 
position.  This  binary  nun^er  la  used  by  the 
expement  processor  to  correct  the  output 
exponent,  and  by  the  lef t-just l f ler  to 
generate  the  additional  shift  pulses  to  lell- 
Justify  tiie  output  maetlssa.  Tlie  overflow 
circuit  Inserts  the  output  sign  bit  adjacent 
to  the  outDUl  MSB  to  complete  the  formatting 
of  the  output . 


Tlie  output  mantissa  is  then  ones  complimented 
and  stored  in  the  X-Y  variable  memory  along 
witti  the  output  exponent.  As  in  the  case  of 
the  fixed"-polnt  processor,  the  output  mantissa 
will  contain  an  error  of  ♦ I I SR . 

C((MI’ARI_S^N  APPROAt^KS 

Detailed  package  count  and  speed  estim.ites  were 
made  for  the  word-paral le 1 /bl t-serlal  processors 
of  Figs.  6 and  8 and  for  alternate  fixed-  and 
floating-point  processors  by  using  a word-serial/ 
hi  t-|;)arallel  approach  with  a 2-st.igc  parallel- 
pipeline  multiplier.  These  estimates  are 
summarized  in  Table  II. 


Tlte  results  shown  in  the  table  Illustrate 
interesting  trends  in  the  word-parallel  and 
bit-parallel  appro.iches.  With  the  serial  multi- 
plier, for  ex.tmple,  circuit  complexity  varies 
linearly  with  the  size  of  the  mantissas  while 
the  product  clock  rate  (based  on  a 25  MHz  clock) 
does  not  depend  directly  on  mantissa  size.  The 
complexity  of  a parallel  multiplier,  on  the  other 
hand,  varies  with  the  number  of  partial  products 
which.  In  turn,  varv  a.s  the  square  of  the 
mantissa  size.  Tl)e  speed  of  the  parallel  multi- 
plier is  limited  bv  the  propagation  of  carries 
when  the  partial  products  are  added  to  form  the 
total  pnuluct,  .and  thus  also  varies  with  tbe 
square  of  the  mantissa  size.  (In  the  table,  a 
lOO-nanosecond  cycle  time  is  assumed  for  lb-bit 
mantiss<is  while  a 200-nanosecond  cycle  tlnx*  is 
as.Humed  for  74-bit  m.intissas). 
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Table  II  - Package  Count  and  Speed  Rstimates  for 
Various  Processors 
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In  float  lnf(  point  operations,  scilinK  of  inanriMsap 
is  ac<  oopl  ished  in  the  hlf-serl.il  processors  with 
a minimum  of  hardware  at  the  ‘expense  of  speed. 
Scaling  In  the  hit-parallel  pr«>cessorB  conversely 
requires  little  tine,  hut  significant  hardware  is 
required  . 

The  hardware  constraints  for  the  exponent  processor 
for  either  approach  tend  to  balance.  Vie  bit-serial 
mantissa  processors  (with  longer  pn'cesslng  times 
per  term)  require  the  scaling  information  for  all 
terms  to  be  available  before  any  terms  can  be  addtd, 
while  the  bit-parallel  mantissa  processors  (hand- 
ling the  terms  serially)  only  require  the  scaling 
information  for  one  term  at  a tlrot,  although  the 
processing  time  per  term  is  shorter. 

The  associated  MSI/SSI  hardwire  is  greater  for  the 
paral le 1 -pioeline  multiplier  because  of  the  need 
to  control  the  fli>w  of  many  parallel  bits,  while 
the  need  to  serially  shift  data  into  the  serial 
multiplier  makes  input/output  and  scaling  occupy 
significant  portions  of  the  total  processing  tirat' 
in  the  bit-serial  processors. 

The  third  column  of  Table  II  illustrates  the 
reduction  in  associated  RSI/SSI  circultrv  for  each 
approach  if  an  optimum  second  LST  chip-type  were 
deve 1 oped , 

SUMMAJIY  AND  CONCLUSIONS 


Elements  for  use  in  programmable  arrays  have 
been  defined,  an!  design  approaches  and  perfor- 
mance indicated  tor  alternate  organisations.  By 
giving  an  element  the  capability  of  evaluating 
one  of  five  multinomial  ex|>ressions  with  pre-set 
coefficients,  up</a  command,  and  by  providing  mc'ans 
for  interconnecting  arrays  of  elements  in  various 
configurati('ns,  a number  of  signal  processing 
functions  can  be  realized.  These  include: 

Classification  and  Control 

Hyper -Surf  ace  Compu^,£at  ion 

Digital  Filtering  (Recursive,  Triinsversal ) 

Transfoimat ions 

The  arrays  can.  In  fact,  be  alternately  and  rapidly 
re-conf Igured  to  do  any  of  the  above  types  of  tasks. 
In  that  the  transformation  coefficients  and  multi- 
nomial degree  and  foto  can  be  readily  controlled, 
an  array  can  be  trained  to  suit  a particular 
si tuat ion . 

Two  types  of  design  approaches  were  studied  for 
realizing  the  hardware  of  a piograxmable  clement, 
one  based  on  a parallel -pipe  1 ine  multiplier  and 
the  other  on  a serial-type  multiplier.  In  each 
case  the  multipliers  would  be  realized  by 
appropriate  custom  LSI,  CMDS/SOS  multipliers. 

Other  circuit  requirements  are  given,  based  both 
on  a second  LSI  chip  for  control  logic  and  on 
available  IC  packages.  Although  a parallel- 
pipeline  multiplier  can  provide  improved  speeds 
of  from  2-to-l  to  ^»-to-l  when  coii^ared  to  the 
aerlal’^type  multiplier,  package  co*mta  were 
correspondinrly  greater.  It  was  shown  that  the 
IC  package  tutals  for  an  element,  based  on  all 


standard  IC's  besides  the  multiplier,  were  at 
least  three  tJnws  greater  for  the  parallel- 
multiplier  <ase.  If  a second  custom  LSI  circuit 
is  specified,  the  package  savings  with  the  serial 
multiplier  Wf»uld  be  at  le.rst  7 to  I for  floating 
point  and  significantly  greater  for  flxed-p->int 
elements.  Considering  that  <>r  all  of  the 

speed  differences  between  the  tWf<  approaches  - an 
he  compensated  for,  when  required,  by  using  «• 
greater  degree  of  parallelism  In  net  otg<iniz>it;  u- 
(Fig.  2),  the  serf  al -mult  ipl  le  r appr-a'h  was 
recotiHnended  for  developnK»nt  . With  i full  t . t 
floating  point  (24-hit  mantissa,  H-blf  exponent 
capability  and  a total  of  10  of  the  custom  !.M 
CMOS/SOS  serial -tvpe  multiplier  packages 
supported  by  15  packages  arcoiiH'dat  ing  custom  I- 
control  chips,  a full  six-term  multinomial  of 
2-inputs  c.in  be  evaluated  In  5.2  n»l«  tosecondi- . 

For  Ih-bit  fixed  point  capability  tf»e  confcoi  *' 
reduce  to  two,  and  the  Cf'Oiputat  ion  peri  *d  reduvc.^ 
to  2.9  microseconds.  The  random- access  memc^ry  t* 
supp<>rt  each  element  will  depend  on  its  degree  « 
multiplexing.  For  a fully  populated  arr.iv,  up  t 
3 RAM  packages  wt'uld  be  required  per  element. 

The  advantages  of  such  arrays  include: 

1.  High-speed  capability  for  general -purpose 
digital  processing.  Tlie  speed  can  be  failort,d  'v 
providing  varying  degrees  of  paral  Jeli&m  of  ‘ fie 
element  In  an  array. 

2.  Capability  for  relieving  software  piobleiL'. 

The  array  can  be  considered  as  firmware  once  a set 
of  coefficient  and  interconnect  vectors  h.-)Vj;  been 
foiuid,  or  are  known,  for  a particular  application. 

3.  Capability  for  providing  fault-toleraru  •;om- 
putlng.  In  general,  alternate  coeffi_lent  and 
interconnect  vectors  exist  to  realize  the  same 

t riinsformation . Hence,  If  an  element  In  a fully 
populated  array  Is  faulty,  reprogramiring  ot  the 
array  is  possible  by  by-passing  any  fault/ 
element(s).  The  degree  to  which  this  by-passing 
is  done  will,  of  course,  depend  upon  the  number 
of  elements  avail.able  and  tlwr*  complexity  of  the 
transformat  ion , 
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IVIKOlH’rTlUN 

It  has  l>een  shown  that  arrays  of  cali  ulalins  elements,  pro- 
viding various  functions  of  their  input  variables,  can  be 
used  as  general-purpose  signal  processors.'  • Such  elements 
could,  in  general,  operate  on  bmarw  analog,  or  numerical 
inputs  Networks  or  arrays  compo.sed  of  elements  in  each 
of  these  domains  have  advantages  in  particular  applica 
tions  The  numerical  tirocessing  element  and  its  use  in 
programmable  arrays  is  the  subject  oi  this  paper. 

The  major  txitentials  of  such  arrays  lie  in  their  ability  to 
perform  reliable  high  speed  pr(K*essing.  and  their  ability  to 
l>e  used  m adaptive  conlnil  sy.stems.*®^'* 

rhe  function  rejiertoire  of  the  basic  element  has  been 
defined  as  a family  containing  l)oth  luiear  and  no  ihnear 
rnultimitnial  expressions,  fhe  particular  function  of  the 
element  at  a given  time,  its  inter-connectivity  within  an  ar 
ray,  and  the  coefficients  or  weights  of  its  multinomial 
terms  are  all  controllable  These  features  provide  an  array 
of  such  elements  with  a high  degri*e  of  flexibilitv.  and 
allows  such  an  array  to  be  alternately  pntgrammed  to 
solve  a large  variety  of  problems  Also,  such  an  array  can 
be  “trained  ■ m that  it  can  rapidly  accept  dilferent  sets  of 
weights  and  connection  commands  until  it  represents  a 
transformation  acceptable  as  a solution  to  a given 
problem  Although  limiter!  versions  of  such  arrays  have 
f>een  built  in  hardware  they  have  f>een  sirnulatHl.  for  the 
most  part,  in  software  This  has  often  ronfined  the  a|> 
plication  of  .such  arrays  to  off-line  processing  or  ex^K-ri 
mental  work  With  the  objec  tive  ol  elfi<  lent  array  realiza 
tion  an  analysis  of  programmable  array  hardware  reipiire 
mentH  has  l>eon  made,  tradeoffs  m its  irnplemeiitalion 
have  been  completed,  and  an  optimum  architecture  has 
bwn  determined  Based  on  slateof  ihe^art  I.SI  technology, 
projections  as  to  array  (lerformance  were  derived  I'his 
derivation  led.  in  turn,  to  the  definition  of  a custom 
t’MOS/SOS  USI  circuit  which  wiiuld  .serve  as  a kev  in 
gredient  in  the  pr»K-essor  of  the  element  I'hi.'i  ciauit  and 
Its  u4e  in  prrigranirnable  arrays  will  be  desc'rilM*d.  Kxarn 
pies  of  array  applicatums  m aerospace  are  presented. 


PKOCRSSING  KLKMKNT  AM)  ARRAY 
STRIKTI'RK 

A basic  element  is  depicted  in  Figure  Ua);  the  figure 
shows  its  major  control  and  input  and  output  signals  Such 
elements  are.  in  general,  structured  in  multilayered  nets, 
as  shown  in  Figure  Kb).  The  interconnection  control, 
which  is  part  of  each  element,  but  which  appears  func- 
tionally between  successive  layers  of  elements,  has  inputs 
which  can  route  the  output  signals  of  any  one  element  to 
fhe  desired  input  of  an  element  in  the  next  layer.  Kach  ele- 
ment and  interconnect  switch  in  the  array  has  sufficient 
memory  for  storing  the  function  type  it  is  to  compute,  the 
nec'essary  weights  or  coefficients  of  its  function,  and  the 
interconnect  data  Hence,  this  programmable  array  can  be 
considered  as  a distnhuiod  logic  memory  processor  capa- 
ble of  rapid  reconfiguration  and  calculation  of  complex 
transformations. 

Before  describing  the  clement  requirements  and  archi 
lecture,  the  three  possible  configurations  of  programmable 
arrays  will  be  explained;  these  configurations  are  shown  in 
Figure  2 In  Figure  2(al.  a net  of  dimension  jXk  is  shown 
fullv  populatiHf.  with  each  element  containing  an  m bit 
store  for  ne<’essarv  funitinn  and  control.  This  configura- 
tion can  pr»K‘ess  (or  transform)  j 2 inputs  at  a time  and 
can  be  operated  in  a pi{M'tme  mmle  so  that  k sets  of  data 
are  sirnullanc'iuslv  o|M'raicd  iq>on  in  different  layers. 

In  higurc  2(b).  a single  layer  of  elements  can  l>e 
imjltipl<‘xed  to  siinulale  a whole  net  The  memory  of  each 
element  must  now  hasi  krn  bits  to  proxide  the  necessary 
(onirot  as  the  priHeH.sor  of  the  element  acts,  in  turn,  to 
realue  ea«  h n!  the  k In  ers  F*»r  anv  single  problem  the 
.sfiecd  i.s  the  same  a.s  m Figure  2(a).  but  pif>e)ining  cannot 
be  done  In  Figure  2u  i a 'single  prtH'essing  element  with 
jkrn  bits  ol  storage  ran  Ih*  le  ed  to  pna  ess  inputs  within  a 
layer  and  then  Miuessiw  lavers  Provisions  must  also  be 
mad<*  lor  storage  ot  intermediate  results  in  Figures  2(b)  and 
(c).  approach  ii  » will  have  I | the  s|M‘t*d  ol  (h) 

It  IS  envisione«l  that  ariavs  ot  up  to  2r>h  elements  (or 
inpiiv.iienU  could  provide  the  priHcssing  capacity  for  a 
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vast  maiority  of  applications  Hence,  the  desi{<n.  using 
either  the  configuration  in  Figure  2{h)  or  (c).  will  have  the 
capacity  for  memor>  expansion  to  simulate  nets  of  up  lo 
256  elements  These  arrays  will,  in  general.  ^>e  l(»aded  and 
controlled  by  a computer,  but  may  have  input  output 
signal  interfaces  as  well,  as  shown  in  Figure  3.  As  such,  the 
programmable  arrays  can  be  considered  as  firmware,  re- 
configurable  upon  command,  for  high-speed  processing 


Element  definition  and  description 

The  basic  numerical  processing  element  of  the  program 
mable  array  will  have  the  capability  to  perform  the  follow 


/>)  v-tV,  4 W,W, 

VI  V-  W'o  t W.X,  f 

}>\\ w,  t w\x,  I w,x,  t w;x,x,  ^ w'*x,*  * w;x,' 

/M\  HVC,-X.X,’ 

IV,  f W',X.  t W,X,  t W',X.  i * W,X, 

Functions  P\,  VZ.  and  V\  are  useful  for  general  purp<*se 
linear  and  nonlinear  hyper-surface  calculations  or 
transformatuins  P\  was  provided  for  realizing  division  by 
a recursion  formula,  M'j.  a linear  combination  of  five 
variables,  is  ideally  suited  for  digital  filters  and  linear 
transforms,  such  as  the  FFT  Kach  of  the  five  functions 
could  Im*  realized  by  specifying  only  one  element.  For 
example.  /M  could  be  used  twice  to  realize  P'1,  etc.  H(»w- 
ever.  substantially  greater  speed  and  efficiency  is 
achieved  bv  providing  a microprogrammed  control  to 
optimally  evaluate  each  function. 

The  element  will  be  given  the  capability  of  operating  on 
d2'btt  Hoating-point  values  with  a 24- bit  mantissa  and  8- 
bit  exponent  It  will  also  be  capable  of  operating  on  fixed 
point  values  of  up  to  24-bits  with  increased  sj>eed  and 
lower  package  count. 

The  basic  element  is  shown  in  Figure  4,  it  consists  of  an 
arithmetic  prc^essor,  an  element  controller  and  random 
access  memories  (RAM's). 

The  arithmetic  processor  performs  additions,  subtrac 
lions,  and  multiplications  of  the  input  variables  and 
weights  to  compute  the  various  polynomials  7'hese  opera 
lions  may  be  either  fixed  or  floating  point,  de^>ending  on 
the  requirements  of  the  array  lo  be  synthesized.  Use  of  an 
iieration  scheme  permits  division  to  be  simulated  with 
several  elements 

The  element  controller  consists  of  a read-only  memory 
(ROM)  containing  microinstructions  for  implementation 
of  the  desired  polynomial  repertoire  and  associated  logic 
to  translate  these  microinstructions  into  control  signals 
for  the  arithmetic  processor  and  address  information  for 
the  RAM’s.  The  RAM's  contain  the  polynomial  function 
selection  and  elment  interconnection  information,  poly 
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noniial  wri){h(.s.  an<i  input  output  vanahit*  storage  for  the 
element  Klement  intereonnet  turn  is  accornplisfied  l)\ 
sj>e<itving  the  HAM  address  of  each  required  input 
va'’'h!e  Outptit  variables  are  st«»red  m se<piential  HAM 
liK..  ions  M more  than  one  element  is  usikI  to  simulate  an 
arras,  the  input  variaf>les  might  he  stored  in  the  RAM’s 
ass)K  laled  vsith  another  element  In  this  i ase.  one  or  more 
inter  element  buses  are  [irovidetl  la  exchange  data 
l»etsseen  elements  Sime  any  element  ran  access  anv  pre- 
siousis  generated  output  in  this  manner.  <oni[)lete  inter 
connettion  flexihilily  is  accomplishtHi  Ar:  intra  element 
bus  providi**  flexible  data  routing  ssithin  the  element.  In 
o^ieration.  a computer  "programs  each  element  by  !<»ad 
ir»g  the  projMT  {Xilvnonunal  selet  t naies,  input  addresses, 
and  |>ol>nomial  weights  into  the  RAM’s  via  the  intra-ele 
merit  hiis  The  computer  then  loads  initial  input  variables 
into  the  variable  memory  via  the  intra  element  bus,  and 
pn»\  ides  an  "exet  ute”  signal  to  the  element  contr<iller. 

Kach  element  controller  will  sequentially  step  through 
the  previously  trained  {xirtions  of  its  associated  memory. 
j)erforming  (he  desired  element  o^>erations  and  storing  (he 
results  in  the  variable  memory  ’Hus  exe<Mi(e  phase  is  com 
plele  when  the  elements  detect  a polynomial  .select  (ode 
equivalent  to  a halt  instruction.  At  this  point  the  element 
cimlToUers  v.\U  nvitput  a “ready*’  signal  to  the  conqniter 
I’txm  receipt  of  the  ready  signal,  the  computer  retrieves  the 
output  data  by  providing  output  addresses  directly  to  the 
element  controllers  and  receiving  the  output  data  via  the 
intra  element  buses 


apprnai'hes 

(liven  the  preceding  o|>erational  constraints,  what  is  the 
f>est  technology  and  architecture  to  provide  a good  balance 
between  speed  and  complexity  or  cost?  I’he  major 
hardware  consideration  impacting  these  criteria  is  the 
multiplier  implementation.  Two  maj<»r  organization  types 
of  elements  were  investigated,  one  using  a high  .speed 
parallel  multiplier  (with  and  without  pipelining)  and  the 
other  a high-s^ieed  serial  parallel  multiplier.  Kstimates  as 
to  total  chip  count  and  speed  were  made  for  each  ajs 
proach  and  for  variations  within  each  approach.  .Available 
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l,SI  and  MSI  l(’*s  are  cmisidered  along  with  key  USI  mul- 
tiplier chips  which  would  have  to  f>e  developed  for  the 
overall  element  realization. 

I'he  necessary  K”s  to  make  the  desired  programmable 
arrays  a viable  priK'essing  scheme  can  be  realized  in  a va 
riely  of  emerging  technologies  Very  high  speed  1*S1  multi 
pliers  have  been  developed  or  are  in  development  m both 
bipolar  and  CM()S  SOS  form '* '*  The  choice  of  IC 
technology  for  each  section  of  the  element  is  dictated  by 
availability  and  performarce  d existing  circuits  (particu 
larly  RAM’s),  as  well  as  (he  best  realization  of  any  re- 
(pared  new  !>il  (ievelopment  T hese  factors  led  to  th»* 
definition  of  a (’MOS  SOS  serial  jiarallel  multiplier  chip 
compatible  in  speed  and  logu  with  ass«M  lated  control  and 
memory’.  The  serial  nature  of  the  design  capitaliz€»s  on  the 
’’on  chip  advantages"  o!  SOS.  and  the  high  packing 
density  of  (’MOS  .SOS  (compared  to  hiixTar  implementa- 
tions) results  in  significant  package-count  saving  The 
parallel  multiplier  apprc»ach  was  ba.sed  on  an  expandable 
8x8  CMOS  SOS  array  Alternate  parallel  multiplier  ap^ 
proaches  were  considered  t(Mi  expensive  or  not  a good 
match  with  the  rest  of  the  element  Although  there  are 
fa.ster  multipliers,  utilization  of  their  s[>eed  to  achieve 
substantially  higher  element  throughput  wmild  require 
more  elalxirale  control  logic  and  RAM’s.  For  highest 
speed,  a 24X24  bit  parallel  array  multiplier,  capable  of 
pipeline  (alternately  referred  to  as  recl<»tked  or  staged) 
operation  can  he  used.  The  developmental  8X8 
(’MOS, SOS  multiplier  with  a 1()0  nanosecond  response 
time  can  optimally  form  the  building  blin  k of  such  a mul- 
tiplier as  well  as  alternate  approaches.” 

A 2-siage  pi)>elined  multiplier  apf>ears  optimum  for  that 
unit,  with  intermediate  storage  provided  by  available 
('MOS  registers.  The  effei'tive  utilization  of  such  a multi- 
plier depends  on  a considerable  number  of  gates  for  bus 
switching  for  its  access  from  the  RAM’s  and  for  scaling  for 
floating  (Miint  justification  The  serial  parallel  multiplier 
approach,  basically  a 1X24  accumulator  allows  for  the 
realization  of  elements  over  a relatively  large  range  of 
t>erforinance  factors  by  means  of  multiplier  duplication. 
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On  the  other  hand,  the  parallel  array  multiplier  locks  the 
desi)(n  into  relatively  hitth  speed  and  coat. 

This  serial  parallel  multiplier  approach  will  be 
deatnhed.  and  performance  factors  will  be  summarized 
and  compared  to  the  parallel  multiplier 

LS/  *rruil  multiplter  cell 

The  I.Sl  serial  multiplier  cell  that  bas  been  develo|)ed  is 
Hh(»wn  in  Figure  5 The  cell  shown  calculates  terms  of  the 
form  ax  * h.  where  o in  the  multiplicand,  x is  the  multi- 
plier. and  h m the  addend  The  cell  shown  can  handle  ad- 
dends and  muittplK  and.s  in  twoVct>mplemented  form  of 
any  desired  length,  and  a multiplier  in  sif(n- magnitude 
form  ( ontaining  a sign  bit  and  up  to  23  significant  bits, 
f'ells  may  be  cascaded  to  form  higher  order  terms  or  to 
handle  larger  multipliers 

Multiplication  is  accomplished  in  the  23  adder  latch 
stages  hy  succesaively  4'iuing  the  contents  of  the  multipli- 
cand The  outputs  of  stages  7,  15.  and  23  are  brought  out 
so  that  the  cell  mav  be  efficiently  used  with  H-.  Uv.  nr  24- bit 
multipliers  (the  first  bit  is  the  sign)  The  output  of  stage  1 is 
used  to  gam  immediate  access  to  the  product  sign  hit  to 
facilitate  complementing  operations  on  the  product. 

The  multiplier  register  is  divided  into  three  serial 


m parallel-out  registers  to  provide  a good  compromise 
between  multiplier  input  leads  and  the  time  required  to 
load  the  registers. 

The  developmental  cell  contains  the  equivalent  of  450  2- 
input  logic  gates,  and  has  been  fabricated  on  a chip  ap- 
proximately 170  mils  square  by  means  of  a double- 
epitaxial  silicon-on-sapphire  CMOS  process.  The 
maximum  clock  rate  is  about  20  MHz.  The  cell  utilizes 
single-phase  clocks,  a single  (xiwer  supply,  is  static  in 
operation,  and  may  be  mounted  in  a 16-pin  package 

Fixed  point  element 

Figure  6 shows  ten  multiplier  cells  arranged  as  a 
processor  to  implement  the  general  second  order 
multinomial  (P3).  Oil  1 functions  as  a 23-stage  delav 
register,  cells  2 through  6 function  as  multipliers,  cells  7 
through  9 function  as  adder  multipliers,  and  cell  10  func- 
tions as  an  adder/ register.  All  six  terms  of  the  multi- 
nomial are  generated  in  parallel,  and  delays  are  matched 
so  that  a single  clock  is  used  for  the  entire  processor  The 
ones  complementer  is  used  to  convert  the  proces,sor  output 
to  sign- magnitude  form  before  storage  in  the  X A'  variable 
RAM  Only  the  23  most  significant  bits  of  * .-  tput  are 
stored.  (Truncation  of  the  least  significant  i use  of 
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871 


a ones  complementer  for  the  conversion  to  sign- magnitude 
form  will  result  in  plus  or  minus  one  I.SB  error  in  the 
stored  output.) 

To  implement  general  purp)ose  arrays,  gating  must  be 
added  to  route  data  from  element  to  element  The  element 
control  logic  can  then  control  the  intercell  data  flow  to 
form  any  given  polynomicd  in  the  repertoire. 

Floatingpoint  element 

A floating-point  arithmetic  element  can  also  he  impie 
mented.  Floating-point  arithmetic  requires  handling  of  an 
exponent,  scaling  of  mantissas  so  that  bits  of  equal  signifi- 
cance are  added  together,  and  left-justification  of  the 
output  mantissa  (with  corresponding  correction  of  the 
output  exponent)  to  retain  the  desired  floating-point 
format. 

TRAINING  AND  ADAPTATION  OF 
MULTINOMIAL  NETWORKS 

We  now  consider  the  methods  for  training  and  adapta- 
tion of  multinomial  networks  that  are  constructed  from 
the  elements  described  above.*'  “ This  is  done  with  special 
search  algorithms,  generally  off-line,  to  specify  network 
configurations.  Applications  of  multinomial  networks 
generally  involve  operations  with  sensor  data  that  are  ob- 
tained as  a result  of  “observing"  a physical  object, 
process,  or  phenomenon.  The  classical  approach  to  design 
of  computer  models  for  inferences  and  predictions  from 
obaervations  has  been  to  determine  all  the  relevant  charac- 
teristics, deterministic  and  or  statistical,  of  the  process  be- 
ing observed,  and  to  use  these  measurements  (and 
assumptions)  in  design  calculations.  Very  often  the  struc- 
ture of  the  model  is  p^sumed  and  the  design  takes  the 
form  of  calculating  the  values  of  certain  parameters.  Even 
if  the  nature  of  the  observed  process  changes,  the  structure 
of  the  model  often  does  not  change,  but  the  parameter 
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values  are  adjusted  in  response  lo  measured  changes  in 
the  inputs  or  in  the  outputs. 

In  many  important  applications,  the  inputs  (i.e..  the 
ofiservables)  are  difficult  to  describe  analytically  The 
btsil  or  even  a good  structure  for  the  model  cannot  )>e  de- 
termined a priori  In  this  case,  it  is  desirable  to  have  a 
model  structure  that  can  adjust  lo  representative  inputs 
That  IS,  the  model  is  trainable  both  in  its  structure  and  in 
its  parameter  values. 

It  is  therefore  desired  to  implement  a general  (usually 
nonlinear)  function  of  certain  input  variables  which  we 
can  call  observables.  Since  little  may  be  known  about  the 
characteristics  of  the  observables,  the  parameters  of  the 
network  are  not  known  a prion  The  network  will  have  to 
be  trained  with  representative  inputs.  The  questions  are 
now: 

1.  How  should  the  element  parameters  be  adjusted? 

2.  How  should  the  elements  lie  interconnected  and  what 
should  their  complexity  (i.e.,  numben  be’’ 

To  make  the  ideas  clear,  suppose  that  the  input  consists  of 

N observables,  x,.  x, x.v  Also  suppose  that  the  output 

is  a .scalar  whose  value  may  be  considered  as  the  estimate 
of  some  property  of  the  input  process.  In  general,  y will  be 
some  nonlinear  function  of  the  x/s  as  follows: 

y = / (x,.  X, Xv  ) 

Polynomial  (multinomial}  approximation 

Under  fairly  general  conditions,  a function  of  N 
variables  may  be  expressed  in  an  N-dimensional  Ma- 
claurin  series  as  follows; 

,V  - Oo  4 Y,  + Z Z “w 

1-1  I. I /-I 

+ L L Z 

«-l  /-I  *•! 

In  the  most  general  caj^.  the  coefficients.  a*>.  a are 

functions  of  time,  but  for  many  cases  of  interest,  the  un- 
derlying characteristics  of  the  x’s  do  not  depend  on  time 
and  consequently  the  coefficients  are  constants. 

Two  questions  which  arise  in  the  use  of  Equations  (I) 
and  (2)  are; 

1.  What  should  the  observables  or  measurements  x,  lie'’ 

2.  How  many  terms  in  Equation  (2)  will  provide  an  ac- 
ceptable approximation  to  the  desired  function,  even 
though  this  (true)  function  is  not  known'’ 

The  answer  to  the  first  question  is:  All  those  that  are 
thought  to  have  a bearing  on  the  desired  output  are  used 
initially,  and  the  ones  which  trial  shows  to  be  of  little  use 
are  discarded.  The  .second  question  is  answered  adaptively 
by  using  a trainable  nonlinear  network  whose  complexity 
determines  the  number  of  terms  in  Equation  (2).  The 
trainable  network  consists  of  iriterconne<'ted  elemenU, 
each  of  which  implements  a simple  nonlinear  function  of 
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tw(»  inputs.  'I'h(‘  total  not’vork  ran  hr  iraim*(i  to  provido  an 
acTeptabk*  appniximation  to  Kquaimn  (2). 

A basic  elenient  of  the  Icarninn  network  is  the  iw«>  input . 
sinKlt-oulput  Hevice.  illustraterl  in  Fi^pire  1 (and  pr«‘ 
viouslv  des<’rihedt  which  implements  the  following  lunc- 
ln>n  (/M)  of  Its  input  x,.  X|; 

y»W'o  i-UiJti  rU'fXgi-  Hi.x,x^  * u^x/  4 n,jc/  (2) 

.Vefay>rfes  of  thv  hasu'  element 

Supjxjse  now  we  consnler  two  layers  ot  sindi  elements  as 
ilhistrated  in  Kipire  7 It  can  he  seen  that  ea<‘h  z,  cfintams 
pairwise  prorlucts  up  to  deKree  four  Note  that  the  first 
layer  contains  all  }M)ssihle  />airs  of  three  inputs  x,.  Xj.  Xs 
To  implement  a general  multinomial  (i  e..  a polynomial  in 
many  variables)  expression,  the  number  oi  elements  in 
each  layer  would  have  to  grow  as  one  proceeds  deeper  into 
the  network  However,  it  :s  found  empirically  that  ac- 
ceptable approximations  are  obtained  without  this  growth, 
in  fact,  the  number  ol  elements  in  successive  layers  will 
soon  (after,  say,  two  or  three  layers)  decrease,  until  only  a 
few  are  left  as  inputs  to  the  final  network  component 
(which  IS  an  adder) 

The  known  data  set 

Sow  we  turn  to  the  matter  of  determininj*  the  coeffi 
cients  of  each  network  element  and  the  number  and  inter 
connections  of  the  elemente. 

These  tasks  are  arcompitshed  with  h known  " data 
base,  that  la,  a data  base  for  which  the  values  of  tlie  de 
pendent  variable  are  known.  I he  stet)s  involved  are. 

I Optimi/.in^  the  coefficients  in  each  element  of  the 
first  layer. 

2.  Selection  of  thrwe  elements  whose  output  is  ac 
ceptable  (rejection  of  poor  j»erformer8); 

2.  Repetition  of  the  prrjcess  for  each  layer;  and 

4.  A Kfobai  optimisation  (arlaptation)  of  all  coefficients 
in  all  layers  baaed  upon  the  final  output 

The  known  data  base  is  divided  into  lhrf*e  irideperid«'nl 


hut  statislu  ally  similar  subsets. 

1 Kill in^  subset 
2.  Selection  suliset 
2.  Kvaluat ion  subset 

The  fittinii'  sub>el  is  UM'd  to  (leleriniiie  the  coetticienls  ot 
the  elements  I'he  seleitir>n  ''Ub^ri  is  used  to  re)e(l  the 
[loor  performers  'I'he  filling’  ami  .selei  tion  subsets  .ire  al.so 
used  lor  th'-  ulohal  opt imi/at ion.  'I'he  evaluation  snivel  is 
u.sed  to  estimalf'  the  overall  perform  ince  Since  the 
evaluation  subset  was  not  used  for  network  synthesr..  the 
[H'rformancc  on  this  subset  is  an  accurate  estimate  ot  the 
ability  of  the  network  to  ^enerali/e  to  new.  previously 
unseen  data. 


Tfxiuhmi  the  m firork 

'The  element  coefticienl  delermmat loris  are  based,  in 
part,  upon  a leasi  sipiare  fit  to  a desired  output.  Other  cri- 
teria arc  of  course  possible  and  are  often  used  Kmploym^j 
a least-squares  criterion,  the  elements  are  first  adjusted  by 
a matrix  aljjebraic  proctnlure  and  then  b>  a rwursive 
search  (i.e..  optimi/anonl  jirocedure.  An  nutlme  of  the 
steps  follows. 

'The  fitting  and  selection  subsets  are  used  alternately  in 
training  each  layer.  'The  fitting  subset  e used  first  to  etr 
tahli.sh  the  coefficients.  'I’he  specific  observables  to  be 
u.sed  initially  have  already  been  i hosen  Let  these  In* 
designated  by  x,.  x*.  x,.  . . . Xv  fhe.se  are  arranged  m 

jiairs.  X,.  x,:  i.  j*l.  , S.  There  are  .Vt.V  li  2 sueh 
pairs.  'Thus,  the  first  Inal  will  require  SiS  1)  2 trainable 
elements  (such  as  that  shown  in  Figure  1 ) A pair  of  ohser 
vables  is  sent  to  each  element.  The  coelficients  of  the  ele 
menl  are  determined  using  a recursive  search  procedure 
With  a least-s<juares  criterion.  The  procedure  is  repeated 
for  each  of  the  jV(N-l )/  2 eleinents. 

Not  all  of  the  pairwise  combinations  are  of  significant 
aid  in  exlrarting  (he  desired  mfonnatior..  The  selection 
jiroccss.  IS  inserteil  int<»  the  first  layer,  resulting  in 
fHH  1 ) 2 pairs  of  inputs  into  the /^(/f  II  2 initial  elements 
of  the  second  layer  The  coefficients  of  each  element  m the 
se<'oml  layer  are  determined  as  in  the  first  layer  Then  the 
selection  subset  is  applunl  to  the  second  layer  This  will 
eliminate  the  uuaieeplahle  pairs  from  the  second  layer 
inputs. 

'I'he  priKOss  is  re|H*ate<l  with  sudCiHlmg  layers  until  the 
error  rate  on  the  seldtion  sub.set  is  minimized  Although 
further  riHluctions  in  the  error  rate  on  the  fitting  subset 
are  realizable  by  incorporating  more  layers,  to  do  so  would 
produce  over  fitting  of  the  fitting  data.  Kventually.  a single 
output  results  from  each  of  several  disjoint  .nubnetworks. 
I hese  outputs  are  added  to  produce  a single  output. 

An  example  of  the  result  of  the  training  pnaess  up  to 
this  point  is  .shown  in  Figure  H.  In  this  hy^Mithetical 
example,  it  is  implied  that  at  least  20  candidate 
liHrarnelers  were  initially  inserted  into  the  first  layer  Only 
H few  .survivi*d.  a.s  indicated  The  figure  shows  that  pair 
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inlrracls  w-ith  p»!»r  Xs,,).  Hiu  pairs  Its.  i,)  an<] 

Ua  i»)  »lo  rnH  inti‘ra<'(  \uih  • t»  h "(In  or  with  the  othnr 
pairs  Thus.  lh»*rr  ar»  (hna  (lis;nmt  suhiu-tworks  wh"s*- 
outputs  ar<-  achitni  i>>  pr-ulm  «-  a sir.gh-  oulpuf 

I’hm-  is  a tinal  •'(♦■p  wi  tin-  training  procr-.s  This  is  a 
process  ol  vernuT  adjustment,  or  ‘’tine  tuning  "I  thi-  roel 
I’icienis  It  the  ruaal  for  (his  vernier  adjustment  arises  it  is 
because  tlu-  coeflicients  of  each  "lement  ha'-r  het*n 
adjusted  in  the  absence  ot  interactions  with  other  eletmnts 
following  them  in  the  netwurk  I'he  optimum  coefficient 
\alues  niav  Ik-  dilterent  when  these  interacteTis  occur 
I he  fittm^j  and  seit^chon  .sufisets  are  also  used  f<»r  this  tmai 
adjustment  process  The  vernier  adjustment  is  a .'’./bal 
search  and  mav  use  a random  .search  lecliiVti^ue  to  otitain 
the  final  values  of  the  coefficients.  "I  fie  same  thdial  ^eart  fi 
adjustment  may  f>e  used  for  suftseijuent  adajti.ilion  "f  tfie 
network 

Alter  the  final  adjustment  ot  coefficients,  the  evaluation 
subset  IS  use<f  to  estimate  the  performance  ot  ifie  entire 
netwfirk. 

Avoidance  of  overfittmti  is  a key  aspect  of  the  irainiiiK  of 
learning  networks  (iood  functional  approximations  t>>  tfie 
fitting  data  subsets  are  ofitained  that  are  also  ^ood  ajs 
proximations  (o  the  data  m the  separate  selection  subsets 
This  means  that  the  networks  are  tauithl  to  genera)i/e 
jiroperiy  on  their  expenenee  in  fitting  the  p<»ints  in  the 
first  subsets  and  that  error  rates  in  later  uses  will  therefore 
fn*  small  Without  avoidance  of  overfilling,  the  networks 
would  give  deceptively  small  errors  in  ajiproximaling  (heir 
first  sets  ol  data  and  then,  in  most  cases,  do  p<»orly  on  Mils 
sequent  new  data  We  have  ail  seen  or  hoard  of  empirical 
models  that  appeared  to  hacc  much  promise  mitiallv.  fiut 
that  produced  unacceptable  errors  when  presented  with 
new  data,  in  most  cases.  su<  h behavior  is  the  result  of 
overfilling  Hv  using  three  independent  sidisets  of  the 
availaf)le  data,  taking  care  that  each  is  statistical! v 
representative  of  the  whole  data  base,  the  problem  of 
iiverfittmg  is  virtually  elmunatef  and  good  advance  esti- 
mates of  operational  error  levels  ol  the  models  are  fd» 
tamed. 

A corollary  to  the  guarantee  that  niodoK  reali/ed  f>y 


Kiinirr  H llluntralivr  learmiig  nrtwnrk 


IcHining  network--  an-  not  overfitted  is  tfie  fact  that  these 
models  are  >moolhlv  filled  functifpnal  ajiproximatieins 
Kmm  the  mathematical  standp«»int.  thev  are  continuous 
and  d’fterentiable  lunctioii  . and  the  derivatives  ol  these 
fund  ions  are  clo.se  ajiproximatirms  to  the  (piantiiative  de- 
rivative fiehavior  of  the  real  processes  that  are  modeled 
For  this  reason,  one  mav  compute  numerical  partial  deriv- 
ative's id  flu*  form  dv  'Cr,  These  derivatives  reveal  the 
(piantiiative  sensitivity  of  the  modeled  variable  (\)  and 
thus  td  the  process  ifiat  has  been  modeled  to  '.mail 
variations  alHuit  speciliKi  values  of  the  network  injojl 
variables  Once  a jirocess  is  modeled,  or  a curve  fit.  bv 
comiHiter  >»lgorii  bin.  the  hardware  elements  (arravl 
descrified  j)revioiwf»-  <.in  fie  jirogrammed  for  highspeed 
real  time  evalu  ihnn  of  new  data 

AKK(  »SF\(  K -M'lM  M A I iUN.s 

Aero-jj.o  < .tjiphi  ai  ."M-  of  the  arra>  nelw-»jrk  cimcepts 
preseiiteci  afiov*-  imiude 

l nondestruitive  mspc-clum  id  structural  jiarts 
2.  irajectorv  (iredit  lions 
■ 1 target  -ignature  classilieations 
f radar  refractive  index  corrections 
detection  of  remote  nuclear  events 
r>.  Voice  data  processing 
7 re<  imn.tissance  image  processing 
M,  electronic  warfare 
9 avionics  information  systems 

riie  status  of  work  in  several  of  these  areas  wifi  now  be 
summari/ed  briefly. 

Snndf'structnv  inspt'vtion 

Un^  of  the  most  representative  applications  to  aerospace 
systems  of  the  array  network  concepts  presentetf  in  this 
paper  IS  that  of  nondestructive  insjiection  of  critical 
structural  jiarts  of  aircrafts  or  missiles.  The  classical 
problem  in  insj)eriion  by  such  means  as  ultrasonic  testing 
IS  that  small  defects  such  as  fatigiu*- induced  micriK  racks 
may  be  masked  by  refletding  surfaces  such  as  lasten^r.-^ 
and  component  surfaces  Although  much  theoretical  work 
has  been  done  to  attempt  to  characterize  the  pulse  (k-hoe*- 
obtained  from  rmcmcracks  and  other  defects,  no  all  em 
bracing  theoretical  formulation  exists  at  this  time  Ihe 
adaptive  training  of  a multinoinial  network  offers  an  at 
tractive  approach  to  the  processing  of  pulse  echoes  in  the 
complex  signal  environment  typical  of  ultrasonic  inspec 
tion  applications  Data  may  first  In*  gathered  on  test 
sfiecimens  having  known  properties  and  used  to  train  the 
network  in  aceordance  with  the  procedures  outlined  above 
Such  work  is  progressing  at  the  present  time  It  has  al- 
ready Iveen  demonstrated  that  the  adaptive  learning  net 
work  can  he  trained  to  classify  correctly  flat-hottom  hole 
delects  in  7075  T6  Aluminum  test  blocks,  using 
transdmets  of  various  diameters  and  differing  hand  pass 
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characteristics,  centered  in  the  nei^jhlwirhoiMi  >i|  ii\e  meifj* 
hertr  As  reported  in  Helerem  e Ih.  in  h loial  sanipie  ol  4>* 
flat-bfittom  ht«le  <iefe<l.s.  4h  were  correitK  tlassitied  hv 
this  technique  An  inqHirtant  aspe<t  «d  the  u'^e  f»f  ihe 
I'nultinoniial  network  method  is  that  it  is  not  ne<essar»  to 
s|>etify  a pnon  the  classifu  r input  parameters  that  are 
must  informative  for  a K<ven  appiu  ation  in  the  < ase  ot  the 
ultrasonii  nondestructive  inspe<  tion  appln  afion.  • aruli 
date  vaveform  parameters  were  c«»nsidered  during  net 
work  training  Fifteen  o!  these  parameters  were  found  to 
be  relevant  to  the  flat-bottom  hole  classifu  ation  I hese 
parameters  are  those  that  descrdw  the  overall  shape  and 
content  (areal  of  certain  parts  of  two  waveforms  < omputed 
from  the  pulse  echo  waveforms  the  Fower  Spei  irum  and 
Its  loK  Fourier  transform,  the  Opstrum  f nterestm*'lv . 
maximum  amplitude  of  the  time  waveform  was  not  |<»und 
to  be  a discriminating  parameter  when  the  iransdtio  r 
and  or  transmission  medium  were  subjeit  to  variation 
The  resultant  network  structure  found  for  Hat  bottom  hole 
defect  classification  consists  of  13  elements  containing  a 
total  of  7M  coefficients,  and  implementinK  an  eighth  ifegiee 
function  of  the  15  input  variables. 

If  app/icafion  to  the  fatigue  crack  specimen*^  a)sH 
cessful,  the  adaptive  learning  netwfjrk  will  provide  an  in 
s|>eclion  tool  of  great  value  to  the  maintenance  of  aircraft 
and  missiles  in  the  operational  inventory  The  work  to 
date  has  already  demonstrated  that  the  new  procedures 
allow  one  to  synthesi/.e  signal  prijcessors  that  deal  with 
waveforms  from  the  real  environment  and  that  learn  which 
parameters  of  these  waveforms  are  the  most  relevant,  while 
combining  (hese  parameter.*!  into  an  adaptively 
trained  signal  classifier  Other  applications  have  been 
suggested  for  the  methodology  in  the  related  areas  of 
acoustic  emission  testing,  computer-aided  manufacturing, 
optimization  of  metal  removing  prmesses.  contr»»l  of  forg 
mg  and  casting,  inference  of  material  physical  properties 
liom  microslructure  data,  and  forecasting  of  maintenance 
necesaitated  by  .stress  corro.sion  deterioration  of  flight  ve 
hide  structures.  In  each  in.slance.  the  key  point  is  that  n»i 
works  can  learn  to  infer  or  predict  from  the  natural  data 
that  are  produced  by  the  pr'Kjesses.  These  “natural”  data 
may  be  records  of  prtxress  sounds,  vibrations,  deterioration 
due  to  corrosion,  etc.  — anything  readily  accessible  for  ec»e 
nomical  insirumenlalion  or  rec  ordkeeping 


Trajectory  predirttonit 

Successful  K&t)  has  lieen  conducted  for  mure  than  a 
decade  for  application  of  multinomial  networks  to  the  pre- 
diction  of  aerospace  vehicle  trajettories  In  summary, 
these  investigations  have  established  that  the  network 
methr>ds  are  capable  of  inferring  vehicle  parameters,  such 
as  ballistic  cfjefficients,  quite  accurately.  Additionally,  the 
networks  make  extremely  fast,  accurate  predictions  of  of> 
ject  trajectories  These  predictions  are  comparable  in  ac 
curacies  to  the  conventional  pr<Kedures  whereby  equations 
of  motion  are  integrated  in  serial  computers,  but  are  very 


much  fas'er  ls*<  ause  of  the  rompuMiig  speed  of  »he  par-dh*! 
pel  w<«rk  si  ru<  lure 


Tnrtict  si^nolur*'  dissifu  ntvtn.s 

Alteniion  i'*  fieing  increaMiiglv  dire<  ted  toward  the  auto 
maiH  classification  of  large!  signatures  from  single  or 
multiple  si-nsors  An  example  of  thi?  work  is  that  of  classi 
!vmg  the  -ources  .jf  ground  vibratams  and  or  acoustic 
emissions  momif»red  by  sensiirs  m air  dmp  ordnance 
\'er\  promising  results  are  bf  mg  obtained  m su<  h work 

Hdiior  "i  /rm  fit  e index  correcfion-s 

Heference  17  presents  a new  ajiproach  to  the  profilem  oi 
computing  height  <-nrre<tion  for  aircraft  or  other  objects 
trai  ked  bv  surta<e  radars  The  haste  procedure,  when  us 
mg  the  multinomial  network,  is  to  ojierate  a c«M»peraMve 
am  raft  that  is  equipped  with  an  ac<  urale  radar  altimeter 
in  those  regions  ol  airspace  f»>r  which  the  true  altitudes  .t 
unknown  targets  are  to  be  obtained  Ihe  ro<q»eraiiv* 
IS  (racked  and  an  adaptive  Jearning  netwo/k  »s  used 
to  model  the  relationship  between  observed  range  eleva 
lion  angle  and  a/miuth  angle,  and  the  inde|Muidentl\ 
measured  height  of  the  coojM*rative  target  It  is  then 
possible  to  interrogate  the  network  very  quicklv  whenever 
the  true  height  ol  an  unknown  target  is  to  be  computed 
The  inputs  to  the  network  become  the  ajiparent  range  eb^- 
vation  and  azimuth  of  the  unknown  target,  and  the  net 
work  output  i.s  the  estimated  true  height  ol  that  target 
'I'he  structure  and  coefficients  of  the  network  are  adjusted 
adaptively  as  atmospheric  conditions  change.  The 
reference  presents  a comparison  lietween  the  accurac  y of 
this  approach  and  that  of  the  conventionei  prex  edure  'Fhe 
height  error,  using  the  new  approach,  is  approximately  ' • 
of  the  error  ohtHined  with  the  prior  state-of  the-art 
method 

hetection  of  n-moi*  nuclt^nr  -l  ents 

Keference  IH  presents  results  of  work  performed  to  a.ssess 
the  accural  v of  an  ai'aplive  learning  network  classifier  for 
discrimination  between  remote  underground  nuclear 
events  and  deepcore  earthquakes  (which  masquerade  as 
nuclear  events  m many  rases).  'I’he  re.sults  show  that 
nearly  perfect  discrimination  fietween  the  two  classes  td 
remote  events  is  obtained 

Vou  r data  pitK  cssinf! 

The  application  of  adaptive  learning  networks  to  the 
identification  of  spoken  languages  is  discussed  in 
Reference  19.  which  presents  the  results  of  an  investiga 
tion  into  multinominal  networks  used  to  generate 
nonlinear  features  of  29  phoneme  and  phoneme  like 
parameters  ohtainHl  from  spfeih  waveforms  'I'he  net 
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works  ar«*  trained  to  discriminate  lietween  each  pair  of 
lanftuai^t's  in  the  set  of  lanKoa^es  to  bc'  identified  fhe 
outputs  of  the  networks  are  then  input  to  a decision  lo^ic 
to  identify  which  lan^ua^e  is  being  s^aiken  In  the  case  ol 
five  languag<*s  to  hi'  discriminat(*d.  this  means  that  a 
gnnip  of  nonlinear  transformations  is  pniducecl  by  10  indi 
vidual  networks,  each  of  which  map>  the  29-dimensionaI 
input  space  onto  a component  of  a KVdimensional  output 
space  The  structure  of  a typical  trained  netw(»rk  is  shown 
in  Kiifure  9 The  results  of  the  cited  investigation  show  that 
significant  improvement  is  obtained  in  the  accuracy  of 
language  identification  and  in  insensitivity  to  idiosyn- 
cracies  of  individual  s{>eakers. 


Sl'MMAHY  AM)  CONCU'SIONS 

Klements  for  use  in  programmable  arrays  have  been  de- 
fined. and  design  approaches  and  j>erformanre  indicated 
for  alternate  organizations  Rv  giving  an  element  the  ca- 
pability evaluating  one  of  five  multinomial  expressions 
with  pre-set  coefficients.  u|:N»n  command,  and  by  providing 
means  tor  interconnei  ting  arrays  of  elements  in  various 
n»nfigurations.  a number  o!  signal  processing  functions 
can  l>e  reali/Ht  These  include 

( 'lassification.  Prediction  and  Control 
Hy(>erMirta<  e ('omputation 
l>igital  Filtering  < Kwursive.  Transversal > 
Iranstfirmations 


*I'he  arrays  can.  in  tact,  lie  alternately  and  rapidly  recon 
figure<l  to  do  anv  ot  the  afnive  tvjies  of  tasks  In  that  the 
transformation  coefficients  and  multinomial  degree  and 
form  can  Im*  readily  c(»ntr<»lled.  an  array  can  tie  trained  to 
suit  a particular  sittialion 

Two  tvjies  of  design  approaches  were  studied  for  realiz 
igrarmnatile  eiemenl.  one  based 


the  hardware  of 
1 parallel  pqielirie 
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t'pe  multiplier  In  each  case  the  multipliers  would  be 
realized  by  appropriate  custom  US|.  (’MOS/SOS  multi 
pliers. 

f'onsidermg  that  some  or  all  of  the  speed  difference^ 
l>etween  the  two  approaches  c an  t>e  compensated  for.  when 
required,  bv  using  a greater  degree  of  parallelism  in  a net 
organization  (Figure  2).  the  serial  multiplier  approach  was 
recommended  for  development.  With  a full  32-bit  floating 
point  (24-bit  mantissa,  K bit  exponent)  capability  total  of 
10  of  the  custom  1./S1  CMOS  SOS  serial-ty^K?  multiplier 
packagers  supp(*rted  by  lo  packages  accommodating 
custom  LSI  {‘ontrol  chips,  a full  six  term  multinomial  of  2- 
inputs  can  be  evaluated  in  5.2  microseconds.  For  IfJ-bit 
fixed  point  capability  the  control  chqjs  reduce  to  two,  and 
the  computation  period  reduces  to  2.9  microseconds.  The 
rand«»m-access  memory  tc>  support  each  element  will 
depend  on  its  degree  of  multiplexing.  For  a fully  ))opulated 
array,  up  to  3 RAM  \ kages  would  be  required  per  ele^ 
ment. 

The  advantages  of  such  arrays  include; 

1.  High  speed  capability  for  general-purpose  digital 
processing.  The  speed  can  be  tailored  by  providing 
varying  degrees  of  parallelism  of  the  element  in  an 
array. 

2.  (’apability  for  relieving  software  prohlem.s  The  array 
can  be  considered  as  firmware  once  a set  of  coeffi 
cient  and  interconnect  vectors  have  been  found,  or 
are  known,  for  a particular  application. 

3 ('apability  for  providing  fault-tolerant  computing.  In 
general,  alternate  coefficient  and  inlerconne<  t vwtors 
exist  to  realize  the  same  transformation  Hence,  if  an 
element  in  a fully  populated  array  is  faulty, 
reprogramming  of  the  array  is  {Missihle  by  by  passing 
any  faulty  element(s)  I’he  degree  to  which  this  by 
passing  IS  done  will,  of  course,  depend  upon  the 
numlx'r  of  elements  available  and  the  complexity  of 
the  transformation. 
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AB*:  TRACT 

In  the  past  two  years,  significant  advances  have 
been  made  In  parallel  array  processing,  particu- 
larly In  the  two-dimensional  programiiable  arrays. 
These  arrays  arc  based  on  elements  capable  of 
evaluat ing,  numerical ly , quadratic  multinomials 
of  two  variables.  Specifically,  a degree  of 
advanced  LSI  technology  has  been  applied  to  the 
arithmetic  unit  of  such  elements,  and  some  of 
the  control  problems  were  addressed  and  deslgi^ 
approaches  developed.  A numlicr  of  new  applicatioi. 
areas  were  identified  and  work  was  done  to  demon- 
strate solutions  with  the  array  techniques. 

The  relationship  between  these  polynomial  modeling 
arrays  and  other  parallel  array  processing  tech- 
niques is  established.  Then,  the  concepts  behind 
these  arrays  that  deal  with  their  use  in  adaptive 
modes  as  well  as  In  more  conventional  high-speed 
signal  processing  are  discussed.  Degrees  of 
hardware  that  can  be  applied  in  simulating  and 
implementing  these  arrays  are  shown.  Concepts  of 
multiplexing,  distributed  memory  and  full  paral- 
lel processing  are  discussed  along  with  appro- 
priate cl rcuit-tethnology  impact.  Finally 
application  areas  are  given. 

INTRODUCTION  h BACKGROUND 

Tills  paper  deals  with  arrays  (physically  realiz- 
able as  one-,  two-,  or  three-dimensional)  of 
Lonputlng  elements  which  can  be  programmed  to 
solve  or  model  particular  input-output  relation- 
ships. Various  terms  have  been  associated  with 
such  arrays,  including: 

distributed  logic /memory  processing^ 
associative  processing^ 
cellular  arrays^ » ^ 
prograonwible  arrays'* 

The  element  (or  primitive  function)  is.  In  effect, 
the  basic  building  block  of  such  arrays  and  can  be 
given  any  one  of  a variety  of  characteristics, 
f^nerally,  the  element  contains  logic  and  meimjry 
circuits  such  that  the  output  is  a function  of 
stored  coefficients  as  well  as  the  input  data  set. 
They  can  be  given  the  ability  to  operate  on 
I)  binary  variables ^ 2)  numerical  variables 

(binary  encoded)  ^ or  3)  analog  variables. 
7,0,9,  lO  Numerical  elements  restricted  to  eval- 
uating a given  type  or  sec  of  expressions,  such 
as  polynomial  expressions,  will  be  Che  topic  of 
discussion  In  this  paper. 

F.ach  of  the  three  types  of  elements  can  theo- 
retically be  structured  ip  arrays  of  arbitrary 


complexity.  Tlu*  advantages  of  nuch 

compared  to  single  element  realizations,  include 

the  possibility  of  higher  throughputs,  and  fault 

tolerance. 

Tfie  structure  of  arrays  is  demonstrated  in 
Fig.  1,  which  shews  ex^imples  of  .me-  and 
two-dimensional  nets.  An  example  of  a one- 
dimonslonal  array  useful  for  signal  processing 
is  a transversal  filter.  Fig.  1(a).  In  this 
application  each  input  is  multiplied  by  a 
weight  (the  element  fund  'on  is  multiplication) 
and  the  outputs  are  summed.  The  fact  that  the 
inputs  are  frequently  time  delayed  samples  of  a 
given  waveform  imparts  the  additional  require- 
ment on  the  element  of  shifting.  Su«h  ’’arrays" 
can  be  realized  by  CCD’s  or  Surface  Acousti' 
k’ave  (SAW)  devices  (for  analog  processing), 

binary  correlators  'for  biiary  or  clipped  signal.) 
or  arithmetic  units  as  in  a digital  filter 
operating  on  A/D  converted  signals. 

TVo  examples  of  two-dimensional  array  torokits  are 
shown  in  Figs.  1(b)  and  (r).  In  the  first  case, 
the  elements  are  arranged  in  layers  and  information 
flows  from  the  input  data  set  to  the  output.  Each 
element’s  output  goes  to  an  Input  In  the  next 
layer.  An  example  of  this  type  of  artay  is  l^e 
polynomial  evaluator  net  now  undergoing  desigt.  atid 
developmt*nC  with  special  LSI  circuits;  this  net  Is 
the  main  subject  of  thi^  dlsiusslon  un  p.irallel 
array  processing.  Another  two-dimensional  struc- 
ture Is  the  so-called  cellular  array  pr  opose  I for 
binary  image  processing,  wticre  #>ach  elemt.u  » *r 
cell)  communicates  with  each  surrounding  ce  I I . • • ^ 
Data  are  manipulated  according  to  "surround 
patterns"  and  a stored  base  of  algorithr.. 
Information  is  entered  In  two-dimenalons  ;md  pr»>- 
cessed  in  two-dimensions. 

n>ere  is  one  more  important  aspect  to  such  arrays, 
which  has  in  fact  been  responsible  f r much  of  the 
research  effort  expended  for  their  j mplemcnta^ 1 c. 
and  application.  TTils  Is  their  potential  for  being 
reconfigured  to  suit  a numl>or  of  different  problems, 
or,  when  used  for  a given  transformat  I'^n,  to  be 
.'idapted  to  changes  In  the  environment  by  virtue  of 
being  able  to  have  their  weights  or  coefficients 
modified.  In  describing  these  processes  the 
array  type  of  Fig.  1(b)  will  be  used  as  an  example, 
although  analogies  exist  for  all  types  of  arrays. 

It  IS  assumed  that  each  elensMit's  output  Y is  a 
function  of  its  inputs  (X^’s)  and  some  constants 
or  weights  (U^'s),  and  furthermore,  that  each 
element  in  a layer  can  be  Interconnected  to  ^my 


470  NAECON '76  RECORD  247 


1 


consists  of  N observables,  xj,  X2.  ....  Xfj. 

Also  suppose  that  the  output  is  a scalar  whose 
value  nwiv  be  considered  as  tite  estimate  of  some 
property  of  tiie  Input  process.  In  general,  y 
will  be  some  nonlinear  function  of  the  Xj*s  as 
foil ows : 

V • f (Xp  x^,  ....  x^)  (1) 

POl.YNOMUl.  (Ml'LTlNt^MIAl.)  A1*PR0XIMA1  ION 
I’nder  falrlv  general  conditions,  a function  of 
S vai  tables  trwiv  be  expressed  in  an  N-dimens tonal 
M^iclaurln  series  as  follows: 


ax  + 

i-1  i-1  1-1 


‘ii  " 


1-1  ) 


1,  -ilK  " ••• 


(2) 


In  the  most  general  case,  the  coef  f 1 dents,  a , 
dj,  are  functions  of  time,  but  for  many  cases 

of  interest,  the  underlying  characteristics  of  the 
x’s  d('  not  depend  <*n  time  and  consequently  the  co- 
efficients are  constants. 

A trainable  network  mav  be  used  to  Implement 
Eq . J.  A basic  element  of  the  network  is  the 
two-input,  single-output  device  illustrated 
In  Fig.  2(a)  (and  previously  alluded)to  which 
implements  the  following  function  (P3)  of  its 
input  Xj,  x^: 


WKX  ^WX^.  2 

3 12  + w^x^ 

NETWORKS  OF  THF  BASIC  ELEMENT 


(3) 


Consider  now  two  lavers  of  such  elements  as 
illustrated  in  Fig.  2(b).  The  figure  shows 
that  each  Zj  contains  pairwise  products  up 
to  degree  four.  Note  that  the  first  layer 
contains  all  possible  pairs  of  three  inputs 
x^,  Xj,  xj.  To  implement  a general  multi- 
nomial expression  (i.e.,  a polynomlnal  in 
many  variables),  the  number  of  elements  in 


“2 


<b) 


Fig.  2 - (a),  Basic  element;  (b),  2-layered  network 
of  eleaeuts. 

each  layer  would  have  to  grow  as  one  proceeds 
deeper  Into  the  network.  However,  It  is  found 
eiiplrlrally  that  acceptable  approximations 


are  obtained  without  this  grcjwth;  in  fact, 
the  number  of  elements  in  successive  layers 
will  soon  (after,  perhaps,  two  or  three  layers) 
decrease,  until  only  a few  are  left  as  inputs 
to  the  final  network  component  (wlilch  is  an 
adder) . 

THE  KNOUN  DATA  SET 

The  matter  of  determining  the  coefficients  of 
each  network  element  iind  the  nufrf)er  and  inter- 
connections of  the  elements  are  now  considered. 

These  tasks  are  accomplished  with  a “known" 
data  base;  that  is,  a data  base  for  which  the 
values  of  the  dependent  variable  are  known. 

The  steps  involved  are: 

1.  Optimization  of  the  coefficients  in 
each  element  of  the  first  layer, 

2.  Selection  of  those  elements  whose 
output  is  acceptable  (rejection  of 
poor  performers), 

3.  Repetition  of  the  process  for  each 
layer,  and 

4.  A global  optimization  (adaptation) 
of  all  coefficients  in  all  layers 
based  upon  the  final  output. 

The  known  data  base  is  divided  into  three 
independent  hut  statistically  similar 
subsets : 

1.  Fitting  subset 

2.  Selection  subset 

3.  Evaluation  subset 

The  fitting  subset  is  used  to  determine 
the  coefficients  of  the  elements.  The 
selection  subset  is  used  to  reject  the 
poor  performers  and  to  determine  when  to 
stop  evalution  of  the  network.  The  fitting 
and  selection  subsets  are  also  used  for  the 
global  optimization.  The  evaluation  subset 
Is  used  to  estimate  the  overall  performance. 

Since  the  evaluation  subset  was  not  used  for 
network  synthesis,  the  performance  on  this  sub- 
set Is  an  accurate  estimate  of  the  ability  of 
the  network  to  generalize  to  new,prevlously 
unseen  data. 

TRAINING  THE  NETWORK 

The  element  coefficient  determinations  are 
based,  in  part,  upon  a least-squares  fit  to  a 
desired  output.  Other  criteria  are,  of  course, 
possible,  and  are  often  used.  Employing  a least- 
squares  criterion,  the  elements  are  first  adjusted 
by  a matrix  algebraic  procedure  and  then  by  a 
recursive  search  (I.e.,  optimization)  procedure. 

An  outline  of  the  steps  follows. 

The  fitting  and  selection  subsets  are  used  alter- 
nately In  training  each  layer.  The  fitting  Is  used 
first  to  establish  the  coefficients.  The  specific 
observables  to  be  used  Initially  have  already  been 
chosen.  Let  these  be  designated  by  xl,  ...  x^. 

These  observables,sre  arranged  in  pairs,  x^,  xj; 
where  1,  J - I,  ...  N.  There  are  N(N-l)/2  such 
pairs.  Thus,  the  first  trial  will  require  N(N-l)/2 
trainable  elements  (such  as  that  shown  in  Fig.  2(a)). 
A pair  of  observables  Is  sent  to  each  element.  The 
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ELEMENT 


OUTPUT 


muKt  be  Incorporated.  The  first 
generation  of  parallel  array  process- 
ing approaches  was  United  In  appli- 
cation because  of  the  relatively 
large  numbers  of  components  required 
to  populate  an  array.  Hence»  most 
work  on  arrays  has  been  done  in  soft- 
ware simulation  of  the  actual  hard- 
ware concepts. 

2)  Control.  Tlie  problems  of  data  entry, 
recon f igurabl 11 ty,  supervisory  control , 
and  system  communication  must  be 
addressed. 


ELEMENT 


(bl 

ELEMENT 


(c) 

Fig.  1 - Parallel  array  types:  (a),  1-dlraensionax ; 
(b)  and  (c),  2-dimensional. 


element  in  a preceding  or  succeeding  layer.  Then 
the  overall  array  can  synthesize  a wide  variety  of 
transformations  between  the  inputs  and  output(s)*, 
limited  only  by  the  element's  primitive 
function  and  the  total  number  of  elements. 

The  features  of  weight  variability  and  inter- 
connect flexibility  allow  such  nets  to  be 
trained  or  to  fit  a desired  Input-output 
characteristic,  as  will  be  described  later. 

Then,  a given  numerical  processing  array, 
for  example,  could  evaluate  a known  trans- 
formation such  as  an  FFT,  as  well  as  heuristic, 
nonlinear  transformations  found  by  search 
algor! t hms . 

The  problems  associated  with  parallel  array 
processing  Include: 

1}  Reasonable  hardware  implementations. 

To  capitalize  on  the  potentials  of 
such  arrays  for  higli-speed  signal 
processing,  latest-technology  LSI 


* In  general,  nets  may  have  more  than  one  output. 


3)  Application.  Many  of  the  problems  which 
can  benefit  from  parallel  array  processing 
are  frequently,  in  essence,  re-stated  so 
as  to  fit  more  conventional  solutions. 

These  problems  must  be  examined  in  a new 
light  to  effectively  apply  the  new 
techniques . 

The  methodology  for  training,  trends  in  hardware 
realization  of  such  arrays,  and  applications  will 
be  discussed. 

TRAINING  AND  ADAPTION  OF  MULTINOMIAL  NETWORKS 

The  methods  for  training  and  adaptation  of  alge- 
braic multinomial  networks  that  are  constructed 
from  the  arithmetic  elements  described  above  are 
now  considered . ^ ^ ^ ^ ^ ^ Tills  is  done  with  special 
algorithms,  generally  off-line,  that  specify  net- 
work configurations.  Applications  of  multinomial 
networks  generally  involve  operations  with  sensor 
data  that  are  obtained  as  a result  of  "observing" 
a physical  object,  process,  or  phenomenon.  The 
classical  approach  to  design  of  computer  models  for 
Inferences  and  predictions  from  observations  has 
been  to  determine  all  the  relevant  characteristics, 
deterministic  and/or  statistical,  of  the 
process  being  observed,  and  to  use  these 
measurements  (and  assumptions)  in  design 
calculations . 

Very  often  the  structure  of  the  model  is 
presumed  and  the  design  takes  the  form  of 
calculating  the  values  of  certain  parameters. 

Even  if  the  nature  of  the  observed  process 
changes,  the  structure  of  the  model  often 
does  not  change,  but  the  parameter  values 
are  adjusted  in  response  to  measured  changes 
in  the  inputs  or  outputs. 

In  many  Important  applications,  the  inputs 
(i.e.,  the  observables)  are  difficult  to 
describe  analytically.  The  best  or  even  a 
good  structure  for  the  model  cannot  be 
determined  a priori.  In  this  case,  it  is 
desirable  to  have  a model  structure  that 
can  adjust  to  representative  Inputs.  That  is, 
the  model  is  t ralnable,  both  in  it  structure 
and  In  Irs  parameter  values. 

It  is  therefore  desired  to  Implement  a general 
(usually  nonlinear)  fimctlon  of  certain  input 
variables  which  we  can  call  observables.  Since 
little  may  be  known  about  the  characteristics 
of  the  observables,  the  parameters  of  the  net- 
work are  not  known  a priori.  The  network  will 
have  to  be  trained  with  representative  Inputs. 

To  make  the  Ideas  clear,  suppose  that  the  input 
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t f ii  i flits  o!  the  elerntnit  are  determined  usin^t 
U-.iNt-squares  cii^erion  prmedure.  The  procedure 
is  re|.eated  tot  each  of  the  N(N-l)/2  elements.  Not 
.ill  ot  the  pairwise  con^inalions  are  of  significant 
aid  in  extracting  the  desired  information.  Tlic 
seh.tion  proiess,  using  the  selection  subset, 
eli-»;inates  those  elements  whose  porforimince  is 
”--t  .acoptable.  rhe  performance  may  be  measured 
: \ the  sqe.ire  . t tlu*  error  nvignitudc.  Now  there 

perhaps  K elements  which  survive.  Tlie  process 
rt-peateef  for  the  second  layer.  This  layer 
••  • »rts  with  (R)  (R-11/2  elements.  The  traitiing 
subset  o!  observables, res t ri ft ed  to  those  which 
survived  the  first  selection  process,  is  inserted 
into  the  first  layer,  resulting  in  R(R-l)/2 
p.ilrs  of  inputs  into  the  R(R'*l)/2  initial 
••lements  of  the  second  layer.  Tlien  the 
solcs.tion  su.set  is  inserted  into  the  first 
layer  .lod  the  selection  process  is  applied 
t ' the  Second  layer.  Tlus  procedure  will 
eliminate  the  unacceptable  pairs  from  the 
;iccond  layer  inputs. 

Tilt-  process  is  repeated  with  succeeding  layers 
until  the  erier  rate  on  the  selection  subset 
1'  minimized.  Although  further  reductions 
in  ’.he  error  ratf  on  the  fitting  subset  .are 
t'  itizable  by  incorporating  more  layers,  to 
di  sf  would  produce  over-tltting  of  the 
fitting  data.  Kventually,  a single  output 
»esulth  from  each  ot  several  disjoint  sub- 
n*t.rforkh.  These  outputs  are  added  to 
produL*  A single  output.  There  is  a final 
step  in  the  training  process.  This  is  a 
pnue&fc  ot  vernier  .adjustment,  or  "fine 
tuning"  .'f  ttie  coe f f ic lent s . 

•‘n  -c  » pr  »cess  !s  modeled,  or  a function  fitted 
•\  ^ o9qv.it er  algorithm,  the  hardware  elements 
(.irxav)  described  previously  can  be  prograrmaed 
f >1  .igh-'ipoed  retl-tlme  ev.^luation  of  nt»w 
ft  1 

i.hrt.  'i  \RKAV  llAKi^iKK 

• -imiilation  and  realization  of  complex 

;■  I'.  i.ial  iW'dellng  nets,  the  degrees  of 
■-■tr  fy.trt  (.tnd  .tsaiic  iated  speed  improvement) 

.•  r ' : 

1.  All  - software  simulation 
. Hardware  multiply 
P tlvnonlal  * valuator 

4.  bleoxnit 

5.  Layer  of  elements  (two  (>r  mtire 
etemonls) 

b.  full  array  (two  or  more  layers) 

7.  Adaptation  algorithm  hardware 

Most  of  the  work,  to  date.  In  array  processing 
has  been  done  In  software  (1).  Wliether  for 
training  or  for  actual  control  and  classification, 
software  Implementations  result  in  inordinately 
long  periods  of  calculation  and  hence  cost. 
Software,  which  is  impractical  for  all  but  the 
sl.Twest  speed  control  problems,  has  spurred  the 
developemnt  of  hardware  to  effectively  deal  with 
Complex  modeling  and  transformation  problems  at 
liiwer  costs  and/or  in  real  time.  Of  course  other 


adv-intages  will  accrue  with  the  hardware,  such  a*- 
improved  reliabtllty  due  to  redieidanry. 

Tlie  first  an<l  most  basic  hardware  1 mproveim’nt  whi'li 
can  he  imfiarted  to  a cooquiter  used  for  polynomial 
modeling  using  arravs  Is  the  in< or pora t i on  of  a 
harilware  multiplier  (2)  with  assorlated  Input-output 
buffer  registers  and  rontrol  logic.  Tills  feature 
can  improve  a software  multiply  period  .invwhere 
from  50  to  500  times  depending  on  the  computer 
(microprocessor  or  miniprocessor).  Next,  Instead 
of  transferring  a single  multiplier  and  multi- 
plicand to  the  hardware  multiplier,  groups  of 
numbers  can  be  transferred  for  evaluation  of  a 
complete  polynomial  expression  (3).  thus  shortening 
overall  transfer  periods.  In  this  concept,  two 
or  more  multipliers  can  be  used.  Tlie  element  (4) 
is  an  extension  of  the  polynomial  evaluator,  and 
Is  defined  as  being  capable  of  evaluating  any 
number  of  polynomial  expressions^^,  storing  Inter- 
mediate results  and  using  them  in  a preprogrammed 
manner.  The  arithmetic  unit  of  the  element  has 
a number  of  multipliers  to  expedite  calculations, 
since  a minimum  of  time  is  necessary  for  computer 
interface.  The  element  has  sufficient  storage  for 
all  the  inputs  (x^'s),  coefficients  (w^j's)  outputs 
(yi's),  and  connection  controls  (cj's)  necessary 
for  an  array  of  a given  capacity.  Tlie  minimum 
improvement  in  speed  provided  by  the  elemc-nt  is 
about  1000  to  1 when  compared  to  software  on  a 
high-speed  processor. 

Up  to  this  degree  of  hardware  (4)  a simple  1/0 
bus  structure  can  be  used  to  cooanuj.icate  between 
host  processor  and  the  multiplier,  polynomial 
evaluator,  or  element.  For  concepts  (5)  or  (b), 
full  parallel  array  processing  is  possible.  To 
achieve  this,  two  techniques  are  required:  loading 
and  switching.  Fig.  3 shows  a multi-layered, 
structure  being  loaded  from  a common  bus  by  the 


LAYER  I LAYER  H 


'Ig.  3 - Loading  and  controlling  a parallel  array. 


host  computer.  Layer  I outputs  are  switched  to 
appropriate  layer  2 inputs  under  the  space-division 
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switch  menK'rv  control  (also  p:  t— loaded  by  the  host 
Ci’tnputer) . 

The  speed  improvt'tnenf s Inherent  with  multi-element 
arrays  compared  to  a,  slnjjle  multiplexed  eleoHuit 
are  prv'port  ional  to  the  ntiirh<*r  of  elem<.‘nts  populat- 
InR  an  array  and  depend  on  whether  multi-layer 
Implementations  can  be  used  in  a pipeline  mode  to 
further  increase  throughput.  With  maximum  employ- 
ment of  LSI  for  the  element  realization,  approaches 
(5)  and  (h)  become  viable  alternatives  for  Improved- 
perform’nce,  parallel  array  processing.  Since  each 
element  In  the  numerical  polynomial  proiessing 
r«^ncept  can  handle  two  inputs,  the  nvitnher  of 
elements  in  a layer,  for  rtuiximum  speed,  for  single 
output  nets,  depends  the  total  number  of 
observables  or  variables  bearing  on  th«*  problem, 
divided  by  two.  Mt's t previous  app  1 ic-a t ions  have 
indicated  a Dviximum  requireimMit  of  20  elements  in 
the  first  layer.  In  the  distributed  element  case, 

(S  and  i’),  the  total  memory  requirements  are  the 
same  .is  for  the  multiplexed  or  time  shared  case, 
except  that  the  menxjry  is  physically  parti- 
tioned along  with  the  arithmetic  function. 

This  arrangement  will  lead  to  different 
rm>im'*ry  org.inizatlons. 

For  example,  whereas  a moderate  size  RAM 
will  be  used  with  a single  time-shared 
el»>ment,  sm.il  ler  register  stacks  would  be 
provided  with  a distrlbuied  element  struc- 
ture, as  part  of  their  -‘.rlthmetic  unl»s. 

Tliis  would  further  simplify  buffering  and 
in terconnect Ions . 

ADAPTATION  Al.CORlTMM  HARDWARE 

An  array  can  be  considered  a hlgli-speed 
numerical  calculator  for  deterministic 
type  problems  (problems  whose  Input-output 
relationships  or  transformations  are  known 
a priori).  An  FFT  is  an  example  of  a known, 
useful  linear  transformation.  Mi)re  interest- 
ingly, and  of  greater  Importance,  the  array 
concept  can  be  used  to  synthesize  arbitrarily 
complex  linear  or  nonlinear  expressions  and  to 
help  find  (in  a training  or  adaptation  mode) 
a suitable  relationship  between  a set  of 
input  data  and  a known  output  result,  as 
described  previously.  Once  a rclationsiiip 
is  found,  the  array  can  be  used  In  Its 
deterministic  mode  appropriately  responding 
to  new  sets  of  data  from  the  same  physical 
system  in  providing  desired  outputs.  In  the 
adaptation  mode,  the  tiardware  element (s)  works 
very  closely  with  the  host  computer,  which  has 
been  pre-loaded  with  algorithnts  for  random 
search  of  the  discriminating  hypersurface,  for 
example.  Here  the  elements  are  used  a«  an  aid 
In  speeding  up  the  adaptation  approach  which 
is,  basically,  a trial  and  error  procedure 
requiring  a large  numfier  of  calculations  to 
'home  in"  on  a desired  or  acceptable  relatlon- 
>nip . 

Various  amounts  of  hardware  can  be  applied  to 
the  adaptation  mode  to  alleviate  the  computer- 
element  commtinlcatlons  problem,  where,  after 
each  trial,  the  host  computer  must  determine 
a new  set  of  weights  for  use  in  the  next  trial, 
etc.  For  these  types  of  algorithms,  random- 
numher  generators  can  be  provided,  associated 


with  the  element,  with  appropriate  control  logic 
and  memory,  so  that  in  the  array  adaptation  tmjdc, 
input  data  are  paired  and  element  weights 
adjusted  automatically  and  rapidly,  in  an  effort 
to  minimize  errors  between  kn«.jwri,  desired 

responses  and  the  caUulate<l  rjutput  responses.  ' 

Weights  derived  from  random-number  genentors 
can  be  Incremented  or  dectetnenfed  by  given 
amounts  .iccordlng  to  preset  or  manipulated 
Statistical  dlstr ibut ions . TiTs'  type  of  hard- 
ware can  eventually  be  coupled  to  the  array 
Itself  to  augment  high-speed  sel  f-ad  just.able 
transformations,  replacing  the  software 

a Igori  thriLS . i 

HARDWARE  FACTORS 

Ihe  overall  hardware  requirements  of  a pro- 
gr.immable  array  capable  of  complex  polytujmial 
modeling  will  depend  on  throughput  requi rements . 

Hie  other  factors  Influencing  hardware  com- 
lexlty  are: 

1.  Precision  of  numerical  calculations 

2.  Fixed-  or  floating-point  notation 

3.  Tot<al  memorv  requi remcti ts  I 

Level  of  LSI  applied 

5.  Calculation  spe.e<)  | 

Shallow  nets  with  relatively  few  layer  nets  dc  I 

not  require  the  precision  of  deep,  highly  non-  I 

linear  nets,  and  can  use  12-  to  16-blt,  fixed 
point  numbers.  However,  for  complex  sltiiaMonc, 
after  the  third  or  fourth  layer,  12-blt,  floating- 
point representation  becomes  important.  Hence, 
an  element  with  varying  capability  would  be 
desired  to  optimize  its  u.se  within  an  array. 

Since  loading  an  array's  merxiry  from  the  host 
proces.sor  can  consume  large  periods  of  time,  the 

array's  memory,  whether  locally  concentrate.d  for  < 

single  eleoK*nts  or  distrlbufcd  for  a fully 
populated  array  of  elements,  should  have  the 
capacity  for  as  many  problems  as  the  unit  will  be 
called  upon  to  solve  in  a given  system.  For 
example,  the  same  array  might  serve  as  a classi- 
fier for  two  or  more  problems,  where  the  ci»ef- 
flcients  and  element  inlerconnectlvi ty  arc  all 
different.  If  the  array's  merot'ry  had  all  the 
necessary  values  stored  for  both  problems,  a 

single  command  could  have  the  array  piovide  | 

c lassi f ica t ion  for  either  problem  with  virtually 
no  added  delay  as  a result  of  problem  change- 
over . 

TEaiNOLOGY  CONS  I DF.RATH)NS 

Wliat  technology  should  polynomial  arrays  use 
for  optimum  perforokince?  LSI  permits  us  to 

consider  fully  populating  large  parallel  arrays  , 

for  high  throughputs.  However,  the  paitliionlng 

of  the  array  element  Into  its  constituent 

functional  parts  must  be  based  upon  a re.alistic 

assessment  of  both  available  and  developn»»‘nlal 

technology  trends.  This  is  particularly  true  i 

for  the  memorv  and  arithmetic  portions  of  the 
element.  Tlu*  fundamental  processes  of  multi- 
plication and  addition  must  he  efficiently 
Implemented  and  matched  to  supporting  ox'mory 
for  accessing  data  and  coefficients  (or 
weights).**  It  was  shown^^  that  the  profusion 
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n^e  network  for  f lat-bot  t«n.i  hole  detect  inference 
consists  of  11  elements  containing  a total  ot 
78  coefficients.  Tlie  network  implements  an  eight 
degree  function  of  the  I!)  input  variables. 

Work  Is  now  procedlng  on  inference  of  the  size  of 
fatigue-induced  cracks  eiminating  from  fastener 
holes  in  specimens  representative  of  wing  structures. 
The  problem  in  the  use  of  tiltrasonlc  Inspection  is 
that  small  defects,  such  as  fatigue  microcracks, 
nviy  be  masked  by  reflecting  surfaces  of  the 
fastener  and  the  part  itself.  Adaptively  trained 
networks  offer  an  attractive  approach  because 
they  do  not  rely  on  theore'.  ical  formulations  for 
these  complex  signal  envi ronments . 

MATKRIAI.  TKCHNULOGY 

Potential  applications  ot  adaptively  trained  non- 
linear networks  in  materials  technology  Include: 

Characterization  of  properties  of  composities 
and  of  adhesively-bonded  asseml)lles 

Inference  of  material  physical  properties 
from  microstructure  data 

Produc t i vi ty  modeling 

MtTnitoring  and  optimization  of  processes 
for  material  removal 

Predictive  control  oi  melting,  forging, 
casting,  extruding,  and  rolling  processes 

Forecasting  of  maintenance  necessitated  by 
corrosion  of  flight  vehicle  structures 

Inference  of  NDK  effectiveness  of  faciliCes 
by  their  performance  <>n  test  sample 
In  each  of  these  instances,  the  key  point  Is 
that  networks  can  learn  to  infer  or  predict  from 
the  natural  data  that  arc  produced  by  real 
processes.  These  "natural”  data  may  be  records 
of  sounds,  vibrations,  etc,  — anything  readily 
accessible  for  economical  instrumentation  or 
recordkeeping. 

PHYSIOLOGICAL  MONITORING 

Adaptive  nonlinear  network  modeling  teclmiques 
have  been  successfully  applied  to  automatic 
interpretation  of  animal  and  human  LKG’s  and 
other  physiological  waveforms.  It  is 

conceivable  that  future  manned  systems  will 
use  trainable  multinomial  networks  to  alert  crew 
members  and/or  remote  medical  personnel  to  subtle 
changes  in  alertness  and  psychological  and 
physiological  readiness  of  the  crew. 

‘U  tTAl  SIGNAL  FROCLSSINC 

noted  elsewnere  in  this  paper,  adaptive 
programmable  networks  can  potentially  Implement 
an  extremely  wide  variety  of  linear  and  nonlinear 
signal  processing  functlcms.  These  functions 
inc lude : 

Auto- and  cross-correlation  functions 
Fast  Fourier  transforms 
P<srfer  spectra  and  cepatra 
Digital  bandpass  filtering 
Kxtractlon  of  time  derivatives 
Transversal  f i 1 ter  tng 


Inference  of  waveform  nonlinear  parameters 
Nonlinear  predictive  coding 

By  coml>ining  these  (pre-processing)  network 
functions  with  other  network  functions  that 
implement  the  nuiln  classification,  prediction, 
decision,  and/or  control  logic,  the  functional 
possibilities  became  boundless.  Clearly,  the 
trend  will  be  C“ward  realization  of  very  fasi 
LSI  arrays  so  that  the  array  hardware  can  be  multi- 
plexed rapidly  to  realize  powerful  processing 
comb ina t ions . 

TRA  IFCTORY  PRKDICTIONS 

Successful  R&D  has  been  conducted  for  more  than 
a decade  by  application  of  multinomial  networr.s 
to  prediction  of  aerospace  vehicle  trajectories. 

In  summary,  these  investigations  have  established 
that  the  network  methods  are  capable  of 
inferring  vehicle  parameters,  such  as  bal- 
listic coefficients,  quite  accurately,  and 
th.it  the  networks  can  m.ike  extremely  fast, 
accurate  predictions  of  ballistic  trajecterles. 
ITiese  predictions  are  comparable  in  accuracy 
to  the  conventional  procedures  where 
equations  of  motion  are  integrated  in  erial 
computers,  but  are  very  mucli  faster  because  of 
the  computing  speed  of  the  parallel  network 
s t rue  ture . 

1AKGET  SIGNATURE  CLASSIFICATION 

Seismic  and/or  acoustic  waveforms  from  reon.'te 
sensors  on  the  ground  can  be  used  with  nonlinear 
networks  to  detect  and  classify  accurately  the 
presence  of  different  types  of  ground  vehicles, 
aircraft,  and  of  personnel. 

RADAR  REFRACTIVE-INDEX  CORRECTIONS 

A nonlinear  adaptive  network  can  be  used  to  fit 
radar  metric  data  on  the  range,  elevation, 
azimuth,  and  true  height  of  a cooperative  target 
or  targets.  After  fitting  a network  nx-*del  to 
these  data,  the  model  may  be  interrogated  to 
estimate  true  helglits  of  any  other  targets 
observed  within  the  bounds  of  the  range,  elevation, 
and  azimuth  training  of  the  network.  By  taking 
care  not  to  overflt  the  model  during  its  training, 
it  will  be  a smoothly-f 1 t ted  functional  approxi- 
mation. From  the  mathematical  standpoint  the 
model  will  be  continuous  and  differentiable, 
and  its  derivatives  will  be  close  approxin^t ions 
to  the  quantitative  derivative  behavior  of  the 
physical  process  (i.e.,  ray  bending  as  a result 
of  ref  ract  i vc-lndex  gradients  within  the  atiaos- 
phere).  Numerical  partial  derivatives  of  the  form 
ay/3x  can  be  used  to  determine  the  slope  of  the 
ray  as  it  traverses  the  anomalous  part  of  the 
atmosphere,  presumably  the  part  Chat  has  been 
adaptively  m(>deled.  Knowing  this  slope,  the  path 
of  the  ray  beyond  the  modeled  region  may  be 
estimated  by  conventional  procedures. 

TTte  results  In  reference  14  show  tliat  a substantial 
improvement  In  accuracy  of  target  height  deter- 
minations can  he  achieved  using  the  network 
iTK^deLlng  method  In  comparison  with  the  usual  method 
for  retractive-index  corrections. 

UF.TF.CTION  OF  RF.MOTF.  NUCl.KAR  EVENTS 

Reference  presents  results  of  work  performed 
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> t m>il  t i r I ' ■ 1 1 1 0(1  t Mtu  I i , oqii  ] 1 t'd  tor  -m 
K X p.' 1 yn..ml .1 1 ittulcling  could  bust  bt*  imple- 
nk'nlod  with  a numt'er  of  LSI  serl  a I /par.j  1 1 c*  t 
tvpi*  iTit.l  t Ipl  1 or«  in  a bit  serial  word  parallel 
tashion  rather  than  parallel  array  type  multt- 
pHers  which,  becanst  ol  their  complexity, 
would  have  to  be  used  in  a bit  parallel  - woid 
serial  fashion.  Tliis  scheme  provided  a mlnlmnni 
caU'ul.ition  llfTM-  n'mplexity  factor  (an  archltei- 
tural  tlKure  ot  merit),  wiiile  ultimately  heinR 
able  i(*  appro. i(h  speeds  ot  alJ-paralh*I  multi- 
pliei  Implementations. 

Tile  Serial  future  I iiie  basii'  multiplier 
simplities  s*alintt  loi  floating  point  (exponent 
iiailrol)  and  addition,  .is  well  .is  minimizing 
pin  ei»nne%  t li'ns  . Vur  tluTTnore , because  of  the 
predominant  1 V "on-ehtp"  natuie  of  the  multi- 
plication logic  (a  single  viutput  driver  only 
being  required),  me  CMOS/SOS  technology  offered 
an  ideal  medium  Sir  ttie  LSI  implementation.  Its 
relatively  tiigt<  packing  density  and  low  power 
dls8ip.it  ion  resulted  in  a full  ^A-bit  multiplier- 
aecuroulator  (capable  -i  an  (ax  + b)  calculation) 
on  only  a ISO-  by  I/O- ml  I chip,  leaving  room  for 
substantial  additional  logic  or  memory  growth, 
while  still  staving  within  :ne  realm  of  high  yield 
tectinology.  Alternate  technologies  could  Improve 
speed  at  a tilgti  chip  count  (cost)  and  power 
dissipation.  An  LCL  lmpienK.'nlat  ion  cinild  iniprove 
speed  by  factors  of  three  to  tive,  but  chip  power 
and  area  constraints  would  inhibit  tne  size  of  the 
multiplier  to  about  eight  <>r  twelve  bits  (instead 
of  24)  .ind  preclude  the  ability  to  add  associated 
circuitry. 

Another  -.'onsider.^tion  in  the.  element  .irt hi  tc.  tur  e 
Is  the  required  memory  .mcl  it.s  Interrelationship 
with  the  multipliers  in  executing  the  polynomial 
calculat  ions.  Kach  element  in  an  .array  (physical 
or  simulated)  miLS  t have  a store  ront. lining  all 
the  coefficients,  data  and  » otminds  it  will  need 
for  a given  transtormat Ion  or  set  of  transfor- 
mations for  rapid  recall  at  the  appropriate  time. 
This  store  is  neceasary  to  minimize  "outside 
World"  int  er.rct  Ion,  since  the  relatively  .slow 
rates  at  which  a host  processor  can  load  new 
sets  of  values  Into  the  array  would  defeat  the 
purpose  of  the  high-speed  processing  capability 
of  the  .irray. 

Presently,  array  designs  are  being  bused  on  the 
use  of  MOS  RAM’s,  providing  access  timt'S  In  the 
order  of  100  nanoseconds.  future  trends  will 
lead  to  the  use  of  CCt)  seii.il  memorle.s,  whose 
characteristics  will  optimally  fit  the  array 
architecture,  at  projected  cost  for  memory  of 
about  1/10  the  cost  of  advanced  RAM's. An 
arriy  organization  based  on  the  use  of  CCU 
memory  is  shown  in  Fig.  4,  Factt  elemcnit's 
mt'mory  would  be  loaded  wiiti  all  the  w's  it  will 
need  to  solve  the  problem(s)  that  the  array  will 
be  used  for.  Tfit'se  w’s  must  of  course  be  pre- 
loaded  in  the  proper  sequence  by  the  host 
computer.  Interface  logic  and  buffering 
requirements  are  mlnimlzeft  beiause  the  serial 
nature  of  the  mullipllers  and  menxirles  are 
compatible.  With  one  or  two  chip  types  used 
for  multiplication  and  associated  control,  and  with 
available,  low  cost,  memory  circuits,  a wide 
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VARIABLE.S 

Fig.  4 - Serial  memory-bank  oiganizatlon  for 


programmable  array. 

performance  spectrum  of  parallel  arrays  can  be 
implemented . 

AH’LICATIONS 

Aerospace  applications  ot  the  array /network 
concepts  presented  above  include: 

Nondestructive  evaluation  of  structural 
partsl®* • i 

M.UfriaIs  lechnology^O,  22 

Phys iologi cal  nvini toring 
KCM  and  ESM  systems 
Digital  signal  processing 
Trajectory  pred let ions^^ 

Target  signature  classifications 
Radar  ref rac 1 1 ve-index  corrections^** 
Detection  of  remote  nuclear  events 
Voice  data  processings^ 

Reconnaissance  image  processing 
Avionics  information  systems 

A number  of  these  areas  are  briefly  described 
b e I ow . 

NONDESTRUCTIVE  EVALUAnON 

It  has  been  shown  that  an  adaptively  trained, 
nonlinear,  multinomial  network  provides 
accurate  inferences  of  flat-bottom  holes  sizes 
in  ultrasonic  nondestructive  evaluation  of 
test  specimens.  Tlie  waveforms  analyzed  were 
tillrasonic  pulse  echoes  obtained  from  two 
different  sets  of  7075-T6  aluminum  area-amplitude 
test  blocks  and  three  different  transducers. 

The  eight  flat-bottom  hole  defect  sizes  ranged 
from  1/64-  to  8/64-inch  in  steps  of  i/64-inch. 

Tlie  15  input  parameters  found  to  be  most  infor- 
mative were  those  that  describe  the  overall 
shape  and  content  (area)  of  parts  of  two  wave- 
forms computed  from  each  pulse  echo  waveform: 
the  power  spectrum  and  its  log  Fourier  transform, 
the  cepstrum.  Maximum  amplitude  of  the  echo  was 
found  not  to  be  a discriminating  parameter  when 
the  material  and/or  transducer  were  changed. 
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to  the  accur.icy  of  an  adaptive  network 

classifier  trained  for  discrimination  between 
remote  imderground  nuclear  events  and  deep-core 
earthquakes  (which  masquerade  as  nuclear  events 
In  many  cases).  The  results  show  that  nearly 
perfect  discrimination  between  the  two  classes 
of  remote  events  is  obtained. 

VQlCfc  DATA  PROCESSING 

The  application  of  adaptive  networks  to  the 
identification  of  spoken  languages  Is  discussed 
in  refeience  Ih,  which  presents  the  results  of 
an  investigation  into  multinomial  networks 
used  to  generate  nonlinear  features  of  2*) 
phoneme  and  phoneme-like  param**ters  4'>btalned  from 
speech  wavetoriftfi.  Tlie  networKs  are  trained  to 
discriminate  between  each  pair  of  languages 
to  he  Identified.  The  outputs  of  the  network.s 
are  then  Input  to  a declslfm  Logic  that 
identifies  which  language  is  being  spoken.  In 
the  case  of  five  languages  to  be  discriminated, 
this  means  that  a group  ot  nonline. ir  t ransfnrm.Ti  iens 
is  produced  by  ten  individual  networks,  each  of 
which  maps  the  29-dl  nvns 1 nnal  input  space  Into  Its 
respective  compont  of  a ten-dimensional  output 
space.  The  structure  of  a typical  trained  network 
is  slujwn  in  Fig.  9.  The  results  of  the  cited 
investigation  show  that  slgniticant  Improvement 
Is  obtained  in  the  accuracy  of  language  idetui- 
ftcatlon  and  In  Insensitivity  to  the  IvUosym  rac  ie.s 
of  individual  speakers. 

AVI  tW  tCS  IN FORMAH i W SYS  ILMS 

The  authors  believe  the  trend  of  adaptive  network 
applications  Is  in  the  direction  of  a versatile 
central  avionics  processor  built  around  a 
programmable  network  that  performs  many  portions 
of  the  avionics  processing  task.  Tlie  tenti.il 
processor  will  be  fed  by  remote  units  that 
perform  signal  conditioning  and  pre-processing 
(parameterization)  of  signals  at  their  sources. 

These  remote  units  m.iy  use  small  programmable 
networks.  Hie  central  processc>r  will  have,  perh.ips, 
200  to  300  hardware  algebraic  elements  in  a network 
that  may  be  switched  almost  Instantly  (via  dis- 
tributed storage  of  pre-lcarned  connectivity  and 
coefficient  information)  between  a large  number  »'f 
uses.  These  uses  might  include  fault  monitoring; 
resources  csiinagement;  real-time  Interpretation  of 
multisensor  data;  navigation,  guidance,  and  control; 
physloIogi( al  monitoring;  EW  functiems;  and  signal 
analysis. 

SIWIARY  AND  CONCLUSIONS 

Methods  for  Implementing  and  applying  electron- 
ically programmable  arrays  for  numerical  pro- 
cessing have  been  explored.  These  arrays  are 
a subset  of  parallel  array  processors  In 
general,  which  include  callular  arrays  operating 
on  binary  signals.  The  distinguishing  feature 
of  the  electronically  programmable  arrays 
explored  In  this  paper  is  their  ability  to  per- 
form complex  modeling  (linear  or  nonlinear) 
of  an  input  data  set  to  achieve  a given  output 
result.  The  rationale  and  ways  of  synthesizing 
and  using  such  trainable  multinomial  networks 
were  described.  The  network  realization  choices 
were  put  In  perspective,  from  software  simulation 
to  all  hardware  array  population  for  maximum 


thr»>ugliput  . fbe  Imp.ict  of  advanced  technology 
in  mt^mory  .*nd  logic  was  then  related  to  the 
programnuiole  arr.iy  hardware.  Finally,  a nun4>er 
ol  application  areas  were  described,  ranging 
from  Inspection  of  struct  tral  parts  to  high 
t;pei*d  oli'Mrnnlc  w.irfare  slgr.al  processing. 

it  is  felt  that  this  type  of  programmab le  array, 
now  undergoing  development  with  advanced  LSI, 
will  find  increasingly  greater  application.  It*s 
inherent  ability  to  he  structured  into  a variety 
of  Configurations  to  efficiently  meet  a broad 
range  of  speed  requirements  fby  trading  off 
hardware  complexity)  and  Its  ability  to  perform 
known  linear  .as  well  as  nonlinear  transformations 
(dt'rlved  by  training)  Imparts  a universality  to 
the  prngr.ammab le  at  ray. 

The  t ranfi  f or  ouj  t I ons  which  tViese  arrays  realize 
c.in  he  u.sed  for  application  in  the  areas  of 
signal  classification,  pattern  recognition,  or 
system  control . 
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Abstract 


The  performance  evaluation  of  a six -class,  single 
target  classifier  unpltintmted  using  atlaptive  polynonial 
networks  is  prt»senteci.  Th«-*  classifier  was  designed  to 
discriminate  betw\.<‘n  six  target  class«,'s:  tnickod  ve- 
hicles, whtvlwi  vehicles,  fix<d  wing  airc.raft,  rotary 
wing  aircraft,  personnel,  and  nuisances.  This  discrim- 
ination is  based  on  signal  waveform  features  (xjmjjuttd 
from  co-located  seismic  and  acoustic  transducers.  The* 
Instmnent  package  is  remote  trem  the  target  and  sig- 
nals are  detected  on  each  channel  passively,  as 
target  moves  within  a range  of  signal  detectability. 

Tlie  data  base  used  to  synthesize  Jind  to  test  the 
classifier  consisted  of  field-recorded  signals.  A 
principal  re.sult  is  that  the  feasibility  of  designing 
a field  usable  seifmic/acoustic  signal  classifier  has 
been  demonstrated.  It  is  shown  that  an  average  over- 
all six-class  accuracy  of  85  percent  with  gxid  range 
capability  and  re^asonable  site  and  speed  independence 
is  achievable. 


The  data  ba.s«*  consist  of  671  and  acrxist  ic 

signatures  representing  six  major  target  cal<*gorUfs. 
inch  signature  cx>nsi.stt>d  of  a data  of  10  .s»*<v>n<ls 

duration  (that  is,  10  sin:l t;int‘»>us  .seiunds  for  th<' 
5>ei3nic  and  acoustic  waveforrr.:^)-  Tlie  siimj)!  ing  rate  was 
2, OCX)  Hz  for  bf>th  waveforms. 

Data  were  recorded  at  thr^n?  sites  AR, 

Grayling.  MI,  Jind  Ft.  Bragg.  NC,  and  signatures  fnm 
Clas5;es  1,  2,  and  6 we*re  available  at  all  thr(H»  loca- 
tions. Aircraft  data,  Classc?s  3 and  1.  were  available 
only  at  Ft.  Bragg,  and  Class  5 data  was  available  only 
at  Yima.  Historically,  the  imst  difficult  classes  to 
discriminate  liave  been  Cla.sses  1 ana  2.  th<-  n'aj()nty  of 
records  in  tlie  data  base  were  firm  rh*>st»  tw».  classes; 

338  of  the  671  signatures  (abr^ut  50  ix*r  '<nt)  w»Te  Class 
1,  28  percent  were  Class  2 and  the  rciiuiinder  of  the  data 
were  spreail  among  the  other  lour  clxsses.  (Seventy-tw») 
percent  of  the  Clxss  1 data  were  rea>rdtxl  at  Y.ma.  ) 


Introduction 


A major  Army  requiranent  is  to  be  able  to  classify 
accurately  remote  target  ^ via  passive  means.  The 
classifier  system  is  pare  of  the  Amy’s  RfJiBASS  (jtte- 
motely  ^nitored  Battlef ieUi  Senator  System)  project. 
Usually  acoustic  (micniphone)  and  seismic  (geophone) 
devices  are  co-locat€*d  and  a classifier  must  discrimi- 
nate up  to  six  target  classes  based  on  pai'ameters  of 
these  two  waveforms.  The  classifier  must  be  reasrjnably 
Insensitive  to  signal  variations  caused  by  range, 
apefld,  and  location  (i.e.,  environmental)  factors. 

Once  a classification  Is  mode,  the  decision  is  radioed 
to  a rwiote  station  for  tactical  evaluation. 


A program  was  inlliatod  to  dononstrate  the  appli- 
cability of  adaptive,  nonlinear  signal  processing 
techniques  for  accurate  classification  of  acoustic  and 
seismic  waveforms  amr>ng  six  target  clxsses.®'^ 

Adaptive  Polynonial  Networks  (Al’Ns)  were  used  in  a 
two-way,  or  pairwise,  class! fi<at ion  mcxle  and  the  over- 
all six-way  classification  was  rendered  by  a voting 
procedure.  The  rfmainder  of  this  paper  desi?rlbes  the 
data  bos<*  anployed.  Uie  extracted  waveform  parameters, 
the  classifier  structure  and  synthesis,  and  a discus- 
sion of  the  results. 


Selgnlc/Aaiustic  fiold-Record  tot a Base 


The  classes  for  target  discrimination  are  the  six 
shown  In  Table  1.  class  mmber  will  be  ustxi  as  a 

oonventlrx)  thnAighout  the  r«nainfk*r  of  this  paper, 

I.e.,  Class  1 denrjtes  trai’ked  vehicles.  Class  2 denotes 
wheeled  vehicles,  etc. 

Table  1:  Six  Target  Classes 


Class  fAmher 


Type 


Abbreviat  ion 


1 Tracked  Vcjhlcle 

2 Wheeled  Vehicle 

3k  Fixed  Wirvt  Aircraft 

4 Rotary  Win»;  Aircraft 

5 Personnel 

6 Nuisance 


TV 

WV 

FWA 

RWA 

PER 

NUS 


The  slgnatun*s  generated  by  dlffenxit  targets  are 
a function  of  many  variables.  Of  primary  impt)rtrince  are 
target  speed  and  distance  fn.m  .scn.sor  (i.e.,  r.'ingf/). 

M;my  combinations  of  these*  twr.i  c<»ndition.s  wf-re  available 
in  the  data  base.  The  speed  versu-s  ninge  distriUitiun 
for  l;ind  targi-ts  v-aried  fnm  6 to  31  mph  and  ton  0 to 
900  for  meters  for  Clxsses  1 and  2,  ami  up  to  lOU  meters 
for  Class  6 (walking).  Similar  data  for  the  air  tai*geis 
were  120  to  450  knots  for  Clxss  3 and  60  to  100  knots 
for  Class  4.  Altitudes  of  200.  4(X1  and  GOO  Iwt  were 
rea^rded  for  both  classes. 

The  geophone  fmployed  had  a resc.>nant  fretiuency  of 
7 Hz,  a 70  percent  cLinping  ratio,  and  a pu.ss  Ixir.d  of 
about  7 to  250  Hz.  The  microphtjne  had  a flat  resp^:>nst' 
from  39  to  40,(X)0  Hz.  The  acoustic  respons**  wi^s  fil- 
tered to  pix^vide  a useable  pass  tonrt  of  about  20  - 500 
Hz.  Both  seianic  and  acoustic  signatures  were  digi- 
tized at  a rate  of  2,000  Hz. 

One  of  the  main  source's  of  confusion  in  the  seianic 
channel  is  due  to  the  filtering  effec:t  of  the  propoga- 
tion  mediifi)  — tlie  earth  — b*-’tween  target  and  gtv>phone. 
This  varies  from  site-to-site,  so  that  a given  target 
moving  at  a fixed  stx*<x1  at  sime  distance  frjn  the  sen- 
sor will  produce  dramatically  different  tiim*  dtmain 
signatures  depending  on  the*  rtxx^rding  site.  Th^*  acixis- 
tic  channel  is  not  affect<xl  by  env ironirn.'nt . Its  main 
susceptibility  is  due  to  wind  Inf luenc<*.s.  Therefore, 
each  signal  channel  has  srimxvlvit  cfmpl»mt*niar>'  ad\an- 
ta^'es  and  disadvantagt»s  and  this  is  one  of  the  reasons 
for  Interest  in  twrv-channel  classifiers. 

The  671  signals  were  divided  into  Design  and 
Evaluation  subsets  as  shown  in  Table  2.  l*h«*  lX*sign  sc»t 

was  used  to  synthesize  the  cla.ssifier.  Its  pt'rfonnance 
was  tested  on  the  remaining  (and  unused ) 446  recorxis. 

The  records  in  the  Design  set  wi'ro  seD'cted  at  ranckm, 
and  represent  alout  one-third  of  the  tiata  frun  each 
class. 
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'Hible  2\  Oinn'^aioii  of  Ix*sij;n  :ind  Evaluation 
I>ala  Subst ! s 

aKs  ^^'aluat  ion 


It.  WT.LS  ffKirid  fnin  the  rf>rn;l.il  Ujn  roiilrix  of 
first  XB  tliiit  jmih*  of  tho  ff-aliin's  wr*n*  highly 

('-r>rrr latixl.  /\n  c*i^:<*nv»*<*tor  ;in:ilysis  >4yjw<'(i  Uuit  only 
nino  un<v>m*lal.o(i  variables  ac«'/)untfd  for  ovor  j)rr- 
cent  of  ih«‘  liila  vai  ianco.  So.  ton  uncorn  latMi  fj-a- 
turos  wttc  dofui'd  fi>rj  the  29  for  classifier  .syntlx'.si.s 


Wavofonn  Pnnum'torlzation 


Th»?  29  WivefoiTTi  f'nr.tfa»'trrs  (i.e  . foaturojf)  ttuil 
wro  extraotixl  fnm  th»*  scisnic  iind  aaiust  ic  channels 
for  one  10s*xt>nd  {x«riif*n  of  a tarya't  run  (uj»ually 
lasting  up  to  U)0  scv(Xids)  arc  listed  in  Table  3. 
Paramr.'ters  fr^ri  I'arh  ch.uineJ  app»\ir  in  tvir.  j;njups 
Uv'au.se  tht-sf  teauires  'At*rc  originally  del  UKxi  by  the 
Hono>w?ll  (1“18)  and  Sylv;mia  (19-28)  (xinixinics: 

Thus.  €«ch  10“S»x’ond  run  wi.s  represt'nled  as  .i  29- 
c<fTipr>n»‘nt  feature  vtx  tf'r. 

Tabic  3:  Seii^nic  and  Acoustic  Wavcfonr'  Ftxiturcs 


S*‘lMnic  1. 

2. 

3. 

•1. 

5. 

6. 

7. 

8. 
9. 

10. 

11. 

12. 

Acixntlc  13. 

14. 

15. 

16. 

17. 

18. 


Counts/ 

Krsil^i^  Epoch 

Zero  Cro.ssing  1 0-1,2*10 

Zon^  Cross  uig  2 0-220 

Zeix>  Crossinf:  3 0-49 

Zero  OTj.ssing  •)  0-105 

Time  Betwe<‘n  Events  1 0-15 

Time  B*'twix*p  IX'onts  2 0-40 

Time  lk't,Mr?en  Events  3 0-14 

Ttne  B**lwis‘n  Events  4 0-21 

^inrxfthness  0-46 

ITiity  r>’cle  Consistency  0-397 

High  I-'rrxjuent’y  Energj'  0-30 

liw  FYequfftry  FjierKy  0-40 

Za.TO  OisKing  1 0-2.518 

Zero  Cn»^.slng  2 0-944 

Ztro  Crossing  3 0-516 

Zero  CYo.ssing  4 0-328 

Duty  Cycle  Omsistcw.'y  0-166 

Rougl.ness  Gxint  0-167 

Volts 

liw  lYnqucmry  Energy  0-6 

l/jw  Band  Envedojx*  Variance  0-6 

Wide  [kind  l>ivt»lope  0-6 

Wide  [land  Envelope  Variance  0-6 

EYequency  0-6 

lYequcncy  V'ariance  0-6 

Variance  of  lYcximmcy  Variance  0-6 
High  lYequency  Incrgy  06 

Wide  Biuid  Envelope  0-6 

I/jw  Band  Envelrjpie  0-6 

I/ignriUir  of  (Acjoustic/Seij^nic) 

Energy  Ratio  Dinensionloss 


wtjerc  the  X;  (i“l 29)  are  the  29  f(*aturcs  of 

Table  3,  is  the  ith  (xfnjxjnont  of  the  elgen- 

Vf>ctor  (i"l,...,2K,  k=l 9).  The  ten  x'  variables 

u»*re  u.sf.-d  a.s  inputs  to  the  Al’N  classifier. 

Classifier  Structure 

T!x'  stm’turc  of  the*  classifier  is  slmvin  in 
Figure  1.  It  con.si.stcxl  of  twi  parts  (1)  15  sub- 
classifiers  to  discriminate  bt  twof»n  all  p»jssible  non- 
rf?jietitivc  pairs  of  clas.scs,  and  (2)  dotisiun  Logic 
for  rendering  one,  overall  six-way  target  classific-a- 
tion.  E-ach  of  tlK*  15  suU'hLssif icrs  is  an  Af*N  and  the 
mr,'tViKlology  as.sicatod  with  their  synthesis  has  bo^rn 
s>jnnarizf>d  previously.^*’  An  APT.'  was  traimxi  via  the 
metluxLs  in  tbe.se  re/erence.s,  using  the 

data  descrilied  above  for  each  of  the  fift€>en  possible 
target  class  pairwise  doci.sions. 

fifteen  Talrvlee 
Dleerlninent 

fueettnns  Declelon  Cetlmeted 

feeturee  (Sutclaesi  f ler* ' Vote*  Lode  Clese 


IT- 


"a  _1 

■;J 


Note  that  certain  of  these  featur*^  are  cr>ncepiual ly 
th<*  sam»  ; hr»t»v«»r,  their  n**thocLs  of  cxtrai'tion  wrre 
different  and  therefore  all  the  features  wre  In- 
cluded for  ci JTHi I eteness.  The  similar  features  are: 

Slolunic  Zero  Crossings  (1-4)  - Soionic  Frfxiiif»ncy  (19) 


Sel^nlc  (9) 

Seianic  Duty  (Tycle 
OonslstefK  y ( 17) 

Seiffnir  High  and  Uxv 
JYixjuetvy  Energy  (11  and  12) 


- SiflHTUC  Wide  Blind 
EYequency  (23) 

- Sel.‘<nic  Freciumcy 
Variance  (24) 

- Sei.snlc  Wide  Band 
Env€*lo|»*  (27) 


Figure  1. 


'‘One-Versus-Onc”  (I'airwlse)  Classifier 
Ar''hlt€x*ture, 


Nrxillnear  palrwisi*  (”ono  versus  one’*)  AT^J  dis- 
criminant functions  were  syntliosiztd  by  cimbining 
pairs  of  Inputs,  x,  and  into  buildlng-bloi'k  poly- 
nfinlal  clttnenls  aciordlng^to  the  equation: 

y * *o  * "l*!  ^ Vi  * Vi^J  ^ ^ Vj 

A nonlinear  dlfscriminant  function  may  consist  of  layers 
of  such  elements  cemhined  to  mrxlt?!  a given  dependent 
variable.  Each  layer  may  ermsist  of  as  inimy  eltmints 
as  the  miaLK'r  of  paii'wise  cimblnatlons  of  the?  Inixit 


p:iram>-*.<*rs  pnw*  "d  hv  tU-il  ljut.  f*nly  ih«-  riKisJ 

distTinnruU  inr  Iv  n'Mincil.  IIh'sk 

rwfits  raiv  lUi'U.  in  »iirn.  U*  lus  ii.pils  lf»  Uv  m*xt 

lay^r  of  th*>  ’rtK»  sirurtar>'t>  ol  thr  lb  non- 

Al’*'.  discTifiiinant  fuiK’liuns  which  wi*r»*  svntiio- 
sizt-d  alorv:  with  flu*  w«*ii:hl  coot  1 ici<'nts  aro  m 

a roi.onl  reix)ri.'* 

The  declsjon  hn»*rsnrfacc  of  each  jiairwiw  <ail>- 
cla.ssiflor  is  such  ttuU  otu*  tarv*'*t  class  t»*fKls  t<->  ho 
iT»ppt«i  into  a tiAtd  value  b«  low  its  Uu'o^il^oi(i  and  the 
other  laryot  class  into  a fix<xJ  valu«’  alxivo.  Tlio 
U!^ial  convot'tion  is  that  a pairwarx*  dus<'runinniir 
function  attFfnjtis  to  m;ip  all  cIjlss  i nmnhrTS  onto  the 
nuTiijtT  +1.0  and  all  ■•lass  j nunl^Ts  onto  the  n'.irnl>T 
-1.0,  with  the  dlscrinanat  ion  thi'osliold  ^;<-t  muluuv 
betwn.‘n  tliese  luo  values,  i.e. , st*t  tu  wro. 

In  a {lairwisc  tf*st , the  pn*xiniitv  of  « Iw  (vmnit<>d 
discrimin.tnt  fun<*t  um  (xitput  to  one  of  the  nirof-r  ■•1 
axul  -1  (osirv;  a suitable  m»*fric  such  us  .sqaiu'e<l  rvir- 
muli/xxl  diften?nre).  the  Krwiur  the  fy>ntidfnce  that 
CAn  be  plactxl  in  tla-  c*onstHiueni  der'ision.  A t lo- 
brc^ln^  straUYy  Uut  exploits  thus  ct>nt  idenro  inlor- 
nution  IS  illustraUxi  by  means  ol  tlie  follixvin«  fvcim- 
pie. 


Table  4 c*>nlain.s  th<*  hypotliot ical  outputs  of  the 
15  Fjairwisc  t€»sts  for  »ne  10-se»V)ntl  re<*nnl.  It  can 
bti  seen  that  vith  a thrt'slmld  ojual  to  zer>.  a j»>sitive 
output  remlers  a -init  vote"  lor  Cla.s.s  \ and  vice  v«*r5sx 
for  a n€V‘^tive  (xitiui.  fThe  value  "i"  in  nlw.iys  less 
than  "J”).  In  this  illustrative  Clas5^‘s  1 

and  2 are  lied  with  fo»ir  votes  f»ach.  Classes  3,  4.  5, 
and  6 are  eliminated  frcin  further  consideration. 


Table  4:  lilustratlun  of  Tle-BreakinK  Stral»‘x?y 


No.  ClaAS  1 Versus  Class  j CXiljxit  Winning  Class 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 


1  vs.  2 +1.01  1 

1  vs.  3 +0.91  1 

1  vs.  4 -0.13  4 

1 vs  5 +0.85  1 

1 vs.  6 +1.15  1 

2 vs.  3 +0.09  2 

2  V3.  4 +0.72  2 

2  vs.  5 +0.87  2 

2 vs.  6 +0.85  2 

3 vs.  4 -0.8H  4 

3  vs.  5 +0.92  3 

3 vs.  6 +l.Uf»  3 

4 vs.  5 -0.78  5 

4 vs.  6 +0.91  4 

5 vs.  6 rO.93  5 


Vot ing  Ip^ir 

Target  Class  . V'Hes 


1 

2 

3 

4 

5 

6 


4 

4 

2 

3 

2 

0 


'Rw*  followinK  m»-‘a5juTf  of  confld»nice  w.l-.  us«*d 
In  the*  case  of  ties; 

K(K-I)/2  I 

“k  - ^ h - 1 xa  1 

tt  • 1 

wlipre 

K » Niinhi'r  of  clannes 

■ Actual  output  for  Clo.ss  k 


V • ‘'S«'lo<:t1on  operation”;  V » 0 Jf  Class  k Is  not 

m t h 

involvf'^l  in  the  m''  ((“Ri  or  if  k is  Involvtxl 

out  loses;  » 1 if  Class  k Is  involvtid  in  the  oi 
tofJt  and  wins. 

is  equal  to  zero  if  the  actual  fiutnut.s, 
always  c.*qual  unity.  Larger  values  of  u.sually  di?note 

wakor  dcci.siotis.  'I^icrefore,  a tie-breaking  straieg>' 

IS  to  evaliuite  for  thos»?  clas.s»:*s  k in  contention  and 

to  choose  tliat  class  for  which  the  M,.  value  is  anallest. 

Omtfnutni’..  for  tht*  ^rxample  given  in  Table  4.  tlx* 
a.ssociated  Mj  and  value.s  for  the  tied  Clasi»»^  1 and 
2 are 

14^  - I’-I.olj  ♦ |l-0.9l|  -t  jl-o.ssl  + ll-l.isl  - 0.40 
K,  - |l-0.69|  t |l-0.7l|  ]l-0.87|  |l-0.85|  -0.88 

Therf'fore,  (Hass  1 Is  the  classified  target  due  to 
it.s  obtaining  the  larger  dc^gree  of  confidence. 

A.S  an  alternative  to  tie-breaking,  it  may  be 
deslic._.^  to  rcpt>rl  all  classes  receiving  at  least  V 
vtes.  For  oxainjile.  a clas.sifior  cun  lx*  regarded  as 
hiiving  produced  a correct  rf'spons#?  wlienever  it  generates 
at  least  V volc^j. 

Tln'se  and  othftr  voting  logic  procedures  have  b»xai 
rofnirfod  in  detail  previously. 

Perfonrvince  Evaluation 

Two  e.ets  of  confusion  ntitrices  have  been  generated 
for  the  nonlinear  ”c)ne-veri?us-ono’'  classifier.  The 
flr.sT  set.  slKiwn  in  Table  5.  was  (^t'tained  by  applica- 
ciem  of  the  tlc*-breaking  decision  U^ic;  and  the  second 
set,  shuun  in  TabU*  C.  wa.s  obtained  by  the  wte-reixirt- 
leidinique  with  V = 4 . as  described  abovt?.  It  was 
also  found  that  this  classilier  is  rea.si;nal)l y independ- 
ent of  range  beivwH*n  target  njxi  senssor.  Although  only 
a few  signatures  wire  availaf^le  at  large  ranges,  no 
misclaswif  ications  were  made  for  tracked  vehiclcjs  be- 
yond fiOO  mi'ters.  Tlie  accoracies  shewn  in  Tables  5 and 
6 ro|)ro*ii*nt  a .significant  Increase  over  previous 
efforts.^ . ^ 


In  evaluating  the  pt^rfonnance  of  a classifier, 
thret*  criteria  should  be*  taken  into  consideration. 

'1’hf‘se  criteria  reflect  tix’  ability  of  a classifier  to 
p<?rfonn  bfith  af.'curately  :ind  oon.si.stently . Thus  the 
perform-ifice  of  tlx*  clasvsifier  was  defined  to  be  a 
fiim!tion  of  thi’w  (;uiuilities: 

1.  A - Overall  accuracy 

2.  C - Oxi.si.stency  of  (derail  accoiracy 

3.  S - Site  imloiiendence 

Th«*  overall  accuracy . A,  was  defined  as  the  nuriier 
of  correct  decisions  divided  bv  ihi'  total  niinf.ier  of 
diydsion.s  for  tlx*  giv«*n  classifier.  The  value  of  A can 
range  fnm  0.  for  total  error,  to  1,  lor  perfect  classi- 
fication. 

Since  U Is  deRlrable  for  a clasi:if*er  to  perform 
fxiually  w(dl  for  all  six  target  classes,  a measure  of 
accuracy  (r#nsi.sfency,  C.  was  tainstnicted  as  follows. 

The  average  accumey  Jind  us  standard  deviation,  o. 
were  (viniiuttd  over  ail  sl.x  cla.s.st*«.  amsiderlng  all 
thr<e  sites.  If  thi*  average  accurat'y  was  tlx?  same  for 
all  six  claiis<»s.  o via;,  rero.  fVmversely,  a large  value 
of  0 dennl<yi  »nc*Tn.sistt*nt  classifications.  'Hierefore, 
tl»?  rrxi.MiHUnvy  mi*a>Ain»  wius  nir^xiud  as:  C • 1 - o. 

'n»»  value  of  C can  rani.e  Inin  0.45.  for  Inconsi.stenl 
cla.s.siflcatluns,  to  1.  for  perfectly  cr>n.sistent  clossl- 
flrat  i(jns. 
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Table  5.  Cx>nfus?<»n  Mtlricv's  *n  Srt  ) 


Using 

Tie  lit 

calking 

AH’,  1'.: 

Ttv 

Cla*A*\ 

Stt" 

yMttos 

1 

T'Hdl 

i 

Act  All' iw  y 

Ft  iTh** 

19 

H 

0 

0 

0 

29 

fd> 

0 

31 

0 

0 

2C 

HI 

3 

3 

2 

29 

0 

0 

0 

.v> 

81 

4 

0 

7 

0 

?3 

0 

0 

74 

03 

i 

0 

0 

0 

0 

0 

0 

0 

- 

• 

a 

0 

0 

0 

0 

0 

00 

91/IJ7  - 78 

Or«)t)  lf« 

39 

8 

0 

0 

40 

n 

3 

3 

40 

0 

0 

0 

43 

.83 

3 

0 

0 

0 

0 

0 

0 

0 

- 

4 

0 

0 

0 

0 

0 

0 

0 

ft 

0 

0 

0 

0 

0 

0 

0 

. 

• 

0 

3 

0 

0 

.ft? 

73/90  - 81 

Tam 

148 

ft 

1 

0 

0 

ISrt 

.95 

a 

Ah 

0 

0 

5it 

7# 

3 

0 

0 

0 

0 

0 

0 

0 

4 

0 

0 

0 

0 

0 

0 

0 

> 

ft 

0 

3 

0 

0 

\9 

0 

23 

.86 

« 

3 

0 

0 

0 

0 

.33 

31V239  - 89 

Overall  Accuracy 

3T7/+W  - 

85 

Table  6 Confu?=;ion  Miitrlccs  (E\'aluati"n  l>riT:i  Si*i) 
Using  Vote  Kevn*rlinc  Proci.'durrs 


Tru»  

Chv  l^tiin 

Cla.VA 

ClAH* 

Sltf 

? * •.-'?  _L. 

3 

ft 

6 

Tm  il 

4r«^ur.cy 

At  ■ ijrAi-y 

n to-a« 

20 

3 

0 

0 

0 

» 

69 

0 

7\ 

0 

0 

?« 

81 

3 

1 

30 

0 

0 

V, 

83 

0 

1 

0 

3J 

0 

0 

34 

96 

0 

0 

0 

0 

0 

0 

0 

- 

a 

0 

0 

0 

0 

0 

3 

.00 

0e,'M7  • 82 

O-ayltrc 

33 

6 

1 

0 

0 

40 

80 

a 

40 

t 

0 

0 

0 

43 

03 

0 

0 

0 

0 

0 

0 

0 

w 

0 

0 

0 

0 

0 

0 

0 

- 

0 

0 

0 

0 

0 

0 

0 

- 

0 

0 

1 

0 

0 

6 

.86 

78/0)  • 87 

fAM 

150 

3 

3 

1 

0 

0 

IVJ 

.96 

3 

4ft 

4 

0 

7 

0 

Sft 

78 

0 

0 

0 

0 

0 

0 

0 

- 

0 

0 

0 

0 

0 

0 

0 

0 

3 

0 

0 

19 

0 

32 

86 

1 

0 

0 

0 

0 

e; 

aic/aio  - 90 

Tlx-  }»-r(  r'5iM» 

\>,n  bv  ' ' 

tiu:  !*ri- 

Vot  <■  H'V  '*1  U4T  W ' 


• . - th-  AIN  • ! . 
U If" 

A C 

, •<  J • J th. 


.It  1<T  >t  ti 

: .l. ^ au'i  (i 

b'/h  7ui 

,7r>i 


ll-iiT" tori',  t>  'ti  ■■  ' iiu-  'i«i'  . r ■•;  i !■  o;.!  . 

^Tll.'■l^^url  1.  P f h'  "<t"  i"t»  i'l 

giviniT  l«tii*r  i. .n i i , 


Sixim-iry  ol 


It  h;is  tvi'n  d'ni'iist r'ltiil  Ul'I  :i  simwiu* /nriKist  f 
Kix  wiy  tarv'-'t  r'.i  'iiriiM'  <;m  roali/isl  u.ii  in 
slmnlal  1' ms.  I'.xtiibit-.  ni)  r ■ iViTa  I ! u-ii.ru'  '/. 

Irnin-' ivtxl  111-  trull  I- n l.'ti.c  and  im|ir' ■■  ml  buss  inv.iri- 
anri',  and  ini|in.v<l  raituii  inv  iriar..'n  nin  si».  iti,-  .-nn- 
rbisinn:.  rmu'lii-d  ari' 

1.  An  avorai'i'  ‘inr.lc  ‘'[Jixti  classification 

aciruracv  of  Kfi  i»Tii'nl  can  tie  loaltzcfl  wuti 
a pta' I ll•.lldp  dosiuTi. 

Z.  Th>‘  all  ■■  accuracy  is  achiovexi  with  high  con- 
sistimcy  at  difforciii  sites  ami  for  the 
different  target  i:las.*-es.  and  the  lassiller 
is  relatively  insensitive  to  target  range 
(out  to  1 he  ).  ripherv  ol  the  target  deUx-tioii 
7iine)  and  t|je  si»svl.  altitude  (where  at>pl  1- 
cahle),  and  h.  adira:  of  the  target , 

1.  !ly  utilizing  a piiiwise  voiiia;  logic  stntr 

tuie,  the  classifier  circuitry  is  poten'ially 
less  prune  to  nianutacturing  tolerance  ermrs 
anil  to  paraim-ter  drift 

A.  Usiiat  ve'e  re[)ornng  in  litxi  of  class  re|*irt 
ing,  the  Voting  structure  is  also  suitable 
for  niulti -1  aiget  classifications.  Purilyr- 
mrire,  the  likelituxxl  ' intentional  or  untn 
tentlonal  .lairiiiingi  ol  the  snisor  's  rediicml 
and  the  usi’r  has  greater  op|x.rtunity  to 
exercise  .tuiiicneiit  concerning  the  tactical 
sltuatlixi . 


Owmll  190/446  - .87 


The  site  independence  mea.sure,  S.  wtes  obtaimxl  a.s 
follows.  The  six  class  accunicies  were  cttttpuUxI  for 
each  of  the  three  sites.  The  value  of  S was  set  ixiucl 
to  the  ratio  of  the  lowest  site  accuracy  to  the  tx'st 
site  accuracy.  Thu.s,  if  a cla.s.sifler  perlornixl  wdl  at 
one  or  more  sites,  Imt  ptxirly  at  one  of  the  other 
sites,  S was  Himll.  Cimversidy,  S approaclaxl  1 a.s  a 
given  cla.ssifier  prtxlucod  consi.stetit , i.e.,  the  Muae, 
accuracy  at  all  .sites. 

The  overall  perlonriance  mesusure  was  ccrt|x)tixl  as 
the  product  of  the  three  criteria  of  success 

P « A X C X S 

Good  perfomtanc' • cxl.sts  when  A,  C,  and  S each  approach 
1,  as  does  P.  Therefore,  any  gr;»ip  of  clas-slliers  can 
be  rated  on  thi'  alnvo  perto'rmance  scale,  wliich  ranges 
fnxa  0 for  prxir  iierfomianiie  to  1 for  pt'rlix't  ixTfori' 
aiKie. 


5.  iXirther  wiik  i.s  nemlixi  t > develop  the  ntisl 
cost-effect  ive  cbe-sifler  design.  .As  a 
foundation  for  thi-  wrk.  additional  t leld 
and/or  syntheti'’  data  should  lx‘  upi.iinixl  si 
as  to  repri'siml  nore  lully  the  wide  larieti 
of  targets  and  terrain  eonili t tons  tlt.it  e old 
N l■li■■l«mt•■rm|  liy  an  o(x'r:il  lenal  -systiiii. 

A>  koxuli  o^xu  nl^ 

Thti.  wi'K  v.t:.  .ii.iried  by  the  lb  S Army  ttiliility 
FcjuliiiX'nt  Hesi'ari  h iikl  li  lel- niixsu  C'enter  iMilUK')  utkler 
Contract  No.  IvVlKiti  71  ‘ ddZZ.  Tin'  autlxirs  Itiank  l>r. 

H.  K.  Yixmg  ol  MiHH  ti  r his  help  and  guPliince  thiMiigh 

(Xit  the  pni.tix'l  ^ 

Riiferenci-s 

I.  Itarnm,  H I "l.  arning  Networks  tm|imve  titnituter- 

At'lixl  PrixlP't  ton  and  Control.  .Aiaiitst  1975, 
(Valter  |x  si(jn. 

? Hunt.  S.  P. , M l>.  l.iyiuui.  ami  D.  L.  Wilson. 

Siiiimlc  Anxr.tli  T iiget  Cln5kst  I ter . (TTT.  Sylvania, 


Notice  that  a classifier  that  h,ad  an  overall 
accuracy  of  90  percent  (A  - 0.9),  with  a sliuidard  di>vl- 
atinn  of  10  percent  (C  ” 1 - 0.1  • 0.9).  ami  a worst- 
slte-to-best-slCe  accuracy  ratio  of  90/100  (S  « 0. 9/1.0 
“ 0.9)  — all  very  gf»«l  values  — wxild  achieve  ii  pi’r- 
fom»uice  value  of 

P - 0.9  X 0.9  X 0.9  - 0.729. 


im-  nil  to  U'.A  MtJtiiib  Omtract"  Dib\Ka2-72-C-0&1i 
Jumi  19i). 

N Oucciardl.  A.  N..  •Flinxints  of  le-aming  CVintrol 
Syslmis  With  Atpl  teat  Ions  to  Industrial  Pro- 
ciwx-s,"  pnx'.  U>72  IIU  Conference  on  Dvx'lslim 
ami  lontrol,  Niw  (tr leans,  in.  Dcx-unber  13-15, 
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CMOS/SOS  Serial-Parallel  Multiplier 


DANIEL  HAMPEL.  SKNioR  MixBER,  ieee,  KENYON  E.  McCUIRE,  member,  ieee,  and  KALMAN  J PROST 


Abitnci-A  244>ii  leiul-pinIM  multipUn  wat  iatifralcd  iaCMOS/ 
«licofi«ii.fapphiK  (SOS)  technology  on  ■ 155  mil  X 170  mtf  chip.  The 
operation  of  thit  muIttpUn  it  deecribed,  Aowing  how  the  pmhtl 
loaded  multiplier  x oomhinee  with  the  leriel  loaded  mnltiphcend,  e.  to 
form  the  •ecial  product.  An  addend,  h,  can  alao  be  accommodatad  to 
produce  ax  * b.  The  daaign  of  the  multiplier  oaOa  an  baaed  on  (tine- 
tional  matonty  logic  adden  and  weak  or  trickle  inverter  maater-riave 
lalchaa.  The  chip  operalea  at  clock  ralaa  up  to  18  MHi.  Power  dia- 
•ipation  at  10  MHi  and  Pqo  of  5 V it  about  20  mW,  and  the  eneigy 
oonaumption  for  multiplyiag  two  Ibbit  numben  ia  about  64  nJ.  Typi- 
cal application  arena  are  mantioned. 


Manuicripl  received  April  14,  1975;  reviled  June  6, 1975.  Thu  work 
wai  aipporled  by  the  Av  Force  Avioaici  Laboratory  under  Contract 
F3361 5-73-1089,  “LSI  Electronically  Progranunable  Anayi”  under  the 
guidance  of  C.  Gwinn. 

The  authori  are  with  the  RCA  Governinent  CommunicatiDni  and 
Automated  Syitemi  Dtvinon,  Somerville.  N.J. 


Introduction 

Advances  in  digital  signal  processing  are  being  made 
possible  by  improved  large-scale  integrated  (LSI)multi- 
^pliers.  Such  multipliers  have  been  built  in  a variety  of 
bipolar  and  MOS  technologies  for  use  in  different  types  of 
processors.  Basically,  two  types  of  multiplier  architectures 
are  of  interest  for  high  performance:  all  parallel  (I),  (2|  and 
serial-parallel  |3|-|S|.  The  all-parallel  multiplier  has,  as  a 
minimum,  a total  of  Af’  adders,  as  well  as  partial  product 
gates  for  in  A/ XAf  capacity.  The  serial-parallel  multiplier  has 
,V  adders  and  two  fi/  latches  (for  storing  and  shifting)  to  succes- 
sively accumulate  the  outputs  of  /V  partial  product  gates.  The 
multiplicand  must  be  available  in  serial  while  the  multiplier  is 
fed  m parallel;  the  product  output  is  also  serial.  The  serial- 
parallel  multiplier  is  thus  much  simpler  to  implement  (as  will 
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Fig.  1.  LSI  serial-parallel  multiplier  functional  diagram. 
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be  seen)  and  requires  fewer  connections.  However,  the  speed 
for  a multiplication  of  two  A'.bit  numbers  is  2A^  X ///,  where 
/ IS  the  control  clock  frequency  governing  the  multiplicand 
input  rate  and  output  product.  The  speed  of  the  parallel  array 
multiplier,  on  the  other  hand,  is  generally  dependent  on  the 
asynchronous  ripple  through  time  of  the  adder  net,  and  for  a 
given  technology  is  substantially  faster  than  the  serial-parallel 
multiplier.  For  many  types  of  digital  signal  processors,  where 
many  multiply  operations  can  be  done  simultaneously,  the 
lower  speed  single  multiply  of  the  serial-parallel  approach  can  be 
compensated  for.  This  is  done  by  using  word-parallel,  bit-serial 
processing,  employing  several  multipliers.  Example  of  proces- 
sors which  can  optimally  use  much  multipliers  include  fast 
Fourier  transform  (FFT)  arithmetic  units  |4j , digital  filters 
|S|,  and  programmable  arrays  used  for  complex,  nonlinear 
transformations  |6|  for  classification,  modeling,  and  control. 

The  serial  nature  of  the  multiplier  to  be  described  using  a 
predominance  of  on-chip  processing  and  the  desire  to  pack  a 
24-bit  array  on  a single  chip  all  lead  to  selecting  CMOS/silicon- 
on-sapphire  (SOS)  as  an  ideal  technology  choice  for  implement- 
ing the  multiplier.  This  paper  will  describe  the  logic  and  cir- 
cuit design  of  the  multiplier,  give  performance  results  of 
samples  of  fabricated  units,  and  suggest  applications. 

Multiplier  De.stription 

Fig.  I IS  the  functional  block  diagram  of  the  multiplier  that 
was  integrated.  The  module  handles  the  multiplicand  in  two’s 
complement  form,  serially  fed,  and  the  multiplier  in  sign- 
nugnilude  form,  with  a sign  bit  and  up  to  23  significant  bits 
stored  in  parallel.  Modules  may  be  caKaded  to  form  higher 
order  terms  or  to  handle  larger  multipliers. 

The  module  contains  the  following. 

I )  A 24-bit  (23  bits  plus  sign)  holding  register  for  the  mul- 


tiplier input.  This  register  is  organized  as  three  8-bit  serial-in, 
parallel-out  registers  as  a compromise  between  the  number  of 
multiplier  input  connections  and  time  required  to  load  the 
register.  A separate  clock  controls  loading  of  the  multiplier 
register. 

2)  23  partial  product  gates  (and  gates)  and  23  adder-latch 
(2  - L)  stages  which  actually  perform  the  multiplication  by 
either  adding  the  contents  of  the  multiplier  regisfr  and  shift- 
ing the  contents  to  the  right  or  by  shifting  right  only,  depend- 
ing on  the  multiplicand  input.  By  providing  access  to  the  in- 
put of  the  first  adder,  an  addend  can  be  fed  in,  synchronously 
with  the  multiplicand,  effectively  giving  the  multiplier  the 
capability  to  perform  max  + b calculation. 

The  outputs  of  stages  1,7,  15,  and  23  are  made  available  so 
that  the  cell  may  be  used  with  8-,  16-,  or  24-bit  multipliers 
(the  additional  bit  being  the  sign),  as  well  as  a I -bit  adder. 

The  serially  fed  multiplicand  and  addend  can  be  of  any 
desired  length. 

3)  Control  logic  providing  the  following  functions.  The 
“two’s  complement”  flip-flop  and  an  associated  exclusive- 
OR  gate  (not  shown)  will  two’s  complement  the  multiplicand 
if  the  multiplier  sign  bit  is  a I.  This  operation  results  in  a 
product  which  is  in  two’s  complement  form,  as  shown  in 
Table  I. 

The  “sign  hold”  flip-flop  serves  two  functions:  it  separates 
the  propagation  delay  of  the  two’s  complementer  circuit  from 
that  of  the  adder-latch  input  stages,  and  it  provides  the  sign 
hold  function.  Propagation  of  the  multiplicand  through  the 
two’s  complement  flip-flop  and  its  associated  logic  increases 
the  multiplicand  input  delay  to  the  adder/latches.  This  in- 
creased delay  would  limit  the  overall  speed  of  the  multiplier 
by  reducing  the  multiplicand  throughput  rate.  By  splitting  up 
the  multiplicand  input  delays,  higher  multiplicand  throughput 
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TABl  K I 

I WO’S  CoMPIbMENTER  Opf  RATION 


Multiplier 

MULTIPLICAND 

TWO'S  COMP 
MULTIPLICAND 

CORRECTED 

multiplicand 

PRODUCT 

SIGN 

MAGNITUDE 

^N 

MAGNITUDE 

SIGN 

MAGNITUDE 

SIGN 

' MAGNITUDE^ 

*■ 

X 

+ 

Y 

NO 

+ 

Y 

•f 

XY 

f 

X 

- 

2N_y 

NO 

- 

2N  _ Y 

- 

X2^-XY 

- 

X 

♦ 

Y 

VES 

- 

zN-y 

- 

X2N-XY 

X 

_ 

jN.y 

YES 

2N-(jN-y) 

+ 

XY 

■Y 


1 THt  PRODUCT  SION  IS  EQUAL  TO  THE  CORRECTED  MULTIPLICAND  SIGN 

2 X2’‘-xr  IS  A VALID  TWO'S  COMPLEMENT  FORM  IF  THE  PRODUCT  IS  TRUNCATED  TO 
ELIMINATE  overflow  BITS  BEYOND  THE  SIGN  BIT  POSITION 


is  achieved  at  the  expense  of  a 1-bit  initial  delay  in  the  multi- 
plicand input. 

OPERATtON 

First,  the  multiplier  must  be  stored  in  its  input  register.  If 
an  addend,  b,  is  to  be  added  to  the  product  [ax),  its  least 
significant  bits  (LSB)  must  be  loaded  before  the  multiplicand 
IS  entered 

Since  bits  of  equal  signiHcance  must  be  added  together,  the 
first  22  bits  of  the  addend  must  be  shifted  into  the  cell  before 
multiplication  starts.  This  preshifting  will  cause  the  addend 
LSB  (now  at  the  output  of  cell  22)  to  be  added  to  the  product 
LSB  m the  23rd  adder-latch  stage.  When  multiplication  starts, 
the  addend  input  will  coriespond  to  the  2^^  bit  in  the  addend. 
Then  the  multiplication  begins. 

When  the  multiplicand  sign  bit  has  been  entered  into  the 
cell,  the  24  I.SB’s  of  the  product  have  been  clocked  out  of 
output  Since  the  product  of  two  23-bit  plus  sign  num- 

bers contains  46  bits  plus  sign,  the  remaining  22  significant 
bits  and  the  new  sign  bit  information  are  still  within  the  23 
cell  stages.  This  information  cannot  be  merely  shifted  out,  as 
carries  from  previous  additions  must  be  allowed  to  modify 
this  mformation. 

This  problem  can  be  solved  by  making  the  multiplicand  47- 
bits  long,  leading  nonsignificant  bits  can  be  inserted  between 
the  multiplicand  sign  bit  and  the  most  significant  bit  (MSB). 
Since  the  multiplicand  is  in  two's  complement  form,  leading 
nonsignificant  bits  are  always  identical  to  the  sign  bit.  By 
"stretching”  the  sign  bit  23  additional  places,  the  effect  of 
lengthening  the  multiplicand  to  47  bits  is  achieved.  In  this 
way  all  of  the  product  bits  are  shifted  out  of  O^j.  As  indicated 
m Table  I,  overflow  bits  beyond  the  sign  must  be  eliminated  to 
preserve  the  two’s  complement  form  of  the  output.  Sign-bit 
stretching  is  accomplished  by  disabling  the  clock  input  to  the 
sign  hold  flip-flop  when  the  sign  bit  is  in  that  flip-flop. 

The  output  of  this  flip-flop  becomes  an  “enable”  signal 
which  will  allow  the  multiplier  bits  to  be  added  to  the  adder- 
latch  contents  when  this  signal  is  a I . The  shift-and-add/shift 
only  algorithm  mentioned  earlier  is  thus  implemented. 

The  module  can  perform  other  functions  in  addition  to  the 


basic  ax  b operation.  By  placing  zeros  in  the  multiplier 
register,  the  cell  functions  as  a 23-stage  shift  register  with  the 
addend  lead  as  the  register  input.  Any  shift  register  length 
less  than  23  stages  is  also  realizable  by  placing  a single  I 
in  the  multiplier,  in  the  appropriate  position.  By  placing 
OKXXXXXXJOOOOOOOOfXXXKXX)  in  the  multiplier  register,  only 
Stage  I will  function  as  an  adder,  while  the  other  stages 
merely  shift,  causing  the  module  to  function  as  a serial  adder 
and  register.  Placing  a 1 on  the  "negate  product”  lead  results 
in  a serial  subtractor  and  register. 

Circuit  Design 

The  main  area  consumption  factors  of  the  cell  are  the  adders 
and  latches  used  for  the  23  stages  of  the  multiplier. 

Fig.  2(a)  is  the  block  diagram  for  the  repetitive  portion  of 
the  multiplier  representing  the  bulk  of  the  chip.  All  latches 
are  based  on  the  trickle  inverter  master-slave  sections  shown 
in  Fig.  2(h).  This  includes  the  multiplier  register,  and  sum 
and  carry  latches.  Only  the  sum  latch  has  a parallel  reset 
capability.  The  partial  product  and  gate  is  realized  with  a 
transmission  gate,  shown  feeding  the  Y input  to  the  full  adder. 
The  design  of  the  adder  and  latches  will  be  described. 

The  trickle  inverter  register  cell  combines  the  simplicity  of 
the  dynamic  register  cell  with  the  advantages  of  full  static 
operation.  A special  inverter  is  designed  with  a greatly  re- 
duced output  drive  capability  so  as  not  to  interact  with  the 
input  signal.  Even  with  a transconductance  24  times  less  than 
the  normal  inverter,  this  trickle  inverter  still  provides  more 
than  10  times  the  current  required  to  cancel  the  highest  leak- 
age cuirents  observed  in  SOS-CMOS  circuits,  e.g.. 

maximum  output  current  of  trickle  inverter  = 86  pA  at  10  V 

maximum  observed  leakage  * 8 pA/gate. 

The  trickle  inverter  will  hold  the  logic  state  indefinitely  as 
did  the  conventional  static  register  cell.  When  new  data  are 
read  into  the  cell,  the  trickle  inverter  is  overdriven  by  a source 
conductance  at  least  5 times  greater  than  the  trickle  inverter 
output  so  as  to  avoid  significant  increases  in  stage  delay.  Ad- 
vantages of  the  trickle  inverter  cell  over  a conventional  static 
register  cell  include  reduced  chip  size,  reduced  loading  oi 
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clock  lines,  and  less  sensitivity  of  this  circuit  to  effects  of 
clock  skewing. 

The  logic  and  circuit  schematic  for  the  full  adder  shown  in 
Fig.  3 uses  two  functional  gates  to  provide  ^om  and  Sum,  each 
realizing  the  reduced  majority  function  of  their  inputs.  The 
lust  gate  produces  Com  (low)  as  two  or  more  out  of  the  three 
inputs  ate  low.  This  signal  (C„u,)  feeds  the  second  gate  with 
a double-weighted  value  along  with  the  three  input  signals. 
The  sum  (high)  output  then  reflects  whether  three  out  of  the 
five  (or  more)  weighted  inputs  are  high.  The  major  advantage 
of  this  adder  is  its  minimized  transistor  requirements.  Total 
transistors  used,  including  inverters  used  for  driving  as  weH 
polarity  reversal,  is  28.  The  inverters  have  twice  the  channel 
width  as  the  gate  devices.  Simulations  of  this  circuit  have 
indicated  an  approximate  delay  of  10  ns  for  carry-out  and  20 
ns  for  sum-out. 

Pkkkjkmance 

The  mtegfated  circuit  is  shown  in  Fig.  4.  The  initial  samples 
were  fabricated  by  a deep  depletion  process,  yielding  devices 
with  a Fj-  of  2 V for  /*  devices  and  a Kt-  ot  1 V for  N devices. 
A special  test  and  control  unit  was  constructed,  employing 
Schottity  TTL  parts,  to  load  and  exercise  the  multiplier.  Also, 
the  chip  multiplier  logic  was  duplicated  in  Schottky  TTL  for 
use  as  a reference  for  test  and  evaluation  ol  the  fabricated 
chips.  The  TTL  version  required  35  IC’s.  dissipating  7.5  W. 
A complete  set  of  test  waveforms  demonstrating  the  operation 
of  the  chip  is  shown  in  Fig.  5,  in  performing  max  * b calcula- 
tion. In  the  example  shown,  a is  the  24-hii  multiplier,  pre- 
loaded  as  three  8-bit  words.  X is  the  24-bit  multiplicand  and  b 
is  the  48-bit  addend.  The  wafer  probe  test  photos  were  made 
at  a Vuij  supply  of  5 V and  a clock  rate  of  about  I MHz  The 


values  in  the  example,  in  octal  code,  are 

a = 0.M134O5 

X = 0740.3416 

* = 060004241400105 

output  ax +6=  112.347040462613. 

The  results  of  tests  run  on  packaged  units  to  determine 
power  dissipation  at  various  supply  voltages  ate  given  in  Fig  6. 
These  curves  include  dc  power  due  to  leakage  as  well  as  .ly- 
namic  power  dissipation.  The  data  represent  the  average  of 
six  samples. 

The  multiplier  efficiency  or,  the  amount  of  energy  it  ex- 
pends in  performing  a given  calculation,  was  determined.  Since 
it  takes  2n  clock  pulses  to  obtain  the  complete  product  of  two 
«-bit  numbers,  the  energy, f,  per  calculation  is 

where  Pq  is  the  chip’s  power  dissipation  at  F and  F is  the 
clock  frequency. 

At  a Vop  of  5 V,  and  a clock  rate  of  10  MHz,  the  cnc. 
cxinsumption  for  multiplying  two  16-bit  numbers  is 

F;  = 20X  10'’  IV X ^ = 64nJ. 

This  energy  is  fairly  constant  for  a given  operating  voltage,  at 
different  clock  frequencies,  since  the  power  dissipation  is  pio 
portional  to  the  clock  frequency.  By  comparison,  one  olhci 
LSI  chip  containing  a 16  X 16  or  greater  multiply  capacity  |4| 
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^l(^  4 Ph.ilomKroKTaph  of  f'MffS/SOS  multfpiKr  (155  mils  X I/O 
mils). 


IS  a high-speed  triple-diffused  bipolar  device  producing  an  out- 
put in  330  ns  while  dissipating  3.5  W,  for  an  energy  of  100  nj 

Applications 

The  multiplier  described  is  most  advantageously  used  in 
those  applications  where  many  simultaneous  multiply  opera- 
tions are  required,  or  can  be  implemented.  In  such  situations, 
word -parallel,  bit -serial  architectures  ate  used  Since  there  are 
up  to  24  bits  of  multiplier  capacity  per  chip,  system  efficiency 
is  increased  when  many  multiplier  chips  are  effectively 
enployed.  Such  architectures  amortize  the  control  logic  neces- 
sary to  exercise  the  multiplier,  i.e.,  providing  appropriate  clock 
signals  for  the  chipfs),  etc.  For  example,  in  their  use  for  FFT 
units,  higher  throughputs  would  be  possible  using  radix4 
organizations,  employing  more  multiplier  parallelism.  In  their 
use  foi  multinomial  realizations,  such  as  in  the  evaluation  of 
the  following  expression, 

y “ Wo  + wiATi  + >^2X2  + Wj3f|.Yi  ♦ ♦ wsAfl. 
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Fig.  6.  Power  versus  speed  for  ('MOS/SOS  multiplier. 


Tills  multiplier  can  also  be  used  with  a microprocessor  to 
provide  the  micro  with  a hardware  multiply  feature.  In  partic- 
ular, the  serial-parallel  multiplier,  operating  from  a separate 
1 0-MI  1/  clock  with  control  logic  supervised  by  the  micro- 
processoi.  can  he  shown  to  give  the  COSMAC*  a 35  ps  multi- 
ply lime  for  multiplying  two  16-bit  numbers.  Much  of  this 
time  IS  III  fad  due  to  the  data  transfer  to  and  from  the  proces- 
sor and  multiplier  This  represents  about  a 100  to  1 improve- 
ment m speed  compared  to  software  multiply. 

Summary  and  Conclusions 

A well  known  multiplier  architecture  has  been  implemented 
in  CMOS/SOS  for  improved  performance.  Basically  an  ox  + b 
module,  this  multiplier  was  integrated  on  a 155  mil  X 170  mil 
chip  with  24-bit  capacity.  The  relatively  small  chip  size  and 
low  pm  count  required  (16  pins)  for  this  type  of  multiplier 
will  lead  to  more  economical  signal  processing  implementa- 
tions or  to  the  incorporation  of  more  associated  logic  on  the 
same  chip  where  higher  performance  is  necessary.  Although 
the  maximum  clock  rate  of  the  initial  design  is  about  18  MHz, 
it  is  fell  that  modirications  can  be  made  to  increase  this  to  30 
MHz.  These  modifications  would  include  a faster  adder,  where 
the  sum  output  gate  does  not  have  to  wait  for  carry  output 
generation  (at  a slight  increase  in  component  count),  and 
improved  layout  to  minimize  RC  lime  constants  associated 
with  interconnect  tunnels 


a serial-parallel  multiplier  hank  is  employed  as  shown  in  Fig.  7. 
All  first-order  multiplications  are  done  simultaneously,  while 
second-order  multiplications  are  combined  with  the  accumula- 
tion feature  of  the  cell  (ax  * b)sia  as  to  form  the  output  func- 
tion most  efficiently. 
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Fig.  7.  Word  parallel  processor  using  custom  multiplier  for  polynomial  evaluation 


layout  of  the  multiplier.  The  chip  was  fabricated  by  (he 

Solid-State  Technology  Center,  under  H.  Borkan's  direction. 
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Internal  Representation  of  Data  in  the 
EAI-640  Computer 

EAI-640  Basic  Instruction  Repertoire 
Simulation  Computer  Programs 


INTERNAL  REPRESENTATION  OF  DATA 


DATA  FORMAT 

A single  datum  consists  of  one  or  more  words  of 
information  depending  on  whether  the  mode  i-.  integer, 
logical,  real,  douhle-precisior’.  or  complex.  Negative 
mantissas  and/or  exponents  are  carried  in  two  s 
complement  form.  The  high  order  word  of  all  data  is 
carried  in  the  (A)  register  during  comoutation  v here  it  is 
expected  by  the  code  generaterl  for  the  Icgical  and 
relational  IF  statement  The  exact  formats  tor  the 
different  modes  are  as  follows: 

INTEGER 

An  integer  datum  consists  of  a right-justified  15  bit  (plus 
sign)  integer  value  carried  in  a single  word. 


DOURLE  PRECISION 

A double-precision  datum  consists  of  a normalized 
53-bit  (plus  s.gn)  mantissa  along  with  a 7 bit  (plus  sign) 
integer  exponent  carried  in  four  consecutive  words.  A 
zero  has  exponent  = 200. 

1st  word 


high  order  15  bits  of  mantissa 


2nd  word 


second  15  bits  of  mantissa 


LOGICAL 

A logical  datum  consists  of  a one  bit  logical  value;  1 
rrreans  .TRUE,  and  0 means  .FALSE..  This  bit  is  carried 
in  the  Sign  portion  of  the  register. 


These  15  bits  may  be  anything 


REAL 

A real  datum  consists  of  normalized  23  bit  (plus  sign) 
mantissa  along  with  a 7-bit  (plus  sign)  integer  exponent 
carried  in  two  consecutive  words  A zero  has  a zero 
mantissa  but  the  exponent  equals  200. 

1st  word 


High  Older  mantissa 


2nd  word 


274 


COMPLEX 


SCALED  FRACTION 


A complex  datum  consists  of  two  real  data  carried  in 
four  consecutive  words.  The  first  real  datum  contains 
the  real  part  of  the  complex  datum  while  the  second  real 
datum  carries  the  imaginary  part. 


A scaled  fraction  datum  consists  of  an  unnormalized 
fractional  value  carried  in  a single  word.  The  decimal 
point  is  considered  to  be  placed  between  the  sign  bit  and 
bit  1. 


1st  word* 


-mantissa  sign 


^Decimal  point  considered  to  be  here. 


2nd  word* 


High  order  mantissa 


'ow  order  mantissa 


exponent 


exponent  sign. 


'real  part 
3rd  word* 


High  order  mantissa 


• mantissa  sign 


low  order  mantissa 


exponent 


xponent  sign 


'imavirtary  part 
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COMPLEX 


SCALED  FRACTION 


A complex  datum  consis.s  of  two  ■•eal  data  carried  in 
four  consecutive  words.  The  first  real  datum  contains 
the  real  part  of  the  compiex  datum  v^hile  the  second  real 
datum  carries  the  imaginary  part. 


A scaled  fraction  datum  consists  of  an  unnormalized 
fractional  value  carried  in  a single  word.  The  decimal 
point  is  considered  to  be  placed  between  the  sign  bit  and 
bit  1. 


1st  word* 


-mantissa  sign 


^Decimal  point  considered  to  be  here. 


2nd  word* 


High  order  mantissa 


low  order  mantissa 


exponent 


-exponent  sign 


'real  part 
3rd  word* 


4th  word* 


High  order  mantissa 


■ mantissa  sign 


low  order  mantissa 


exponent 


exponent  sign 


imaginary  part 
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FUNCTIONAL  LISTING  OF  640  INSTRUCTIONS 


MNUiONiC  CODE 


OCTAL  CODE 


INSTRUCTION  DESCRIPTION 


TIME  IN 
MICROSECONDS 


CONDITION 

CODE** 


transfers 

LA 

STA 

LX 

STX 


UCDDO 

iSEDOO 

OS<E4-2)OOD* 

05EODD* 


lo3d  AccufnuUtof 
Store  Accumulator 
Load  Index  Register 
Store  Index  Register 


3.30 

3.30 

3.30 

330 


ARITHMETIC 

A 

S 

M 

0 

SQR 

AOA 

AOM 

TCA 

CLR 

CAP 


tSEOOO 

17EOOO 

03EOOO 

OIEOOO 

021400 

020040 

07EOOO 

020100 

026740 


Add 

Subtricl 
Multiply 
Divide 
Square  Root 

Add  One  to  Accumulator 

Add  0/)e  to  Merrmry  and  Skip 

Two  s Complement  Accumulator 

Clear  Accumulator 

Clear  and  Ajj  Qne  to  Accumulator 


3.30 

3.30 

18.16 

18.976 

16.6 

1.66 

4.96 

1.66 

1.66 

166 


LOGICAL 

OR 

XOft 

AND 

C 

OCA 


lOEDOD 

IIEOOO 

13E0OD 

12E000 

020200 


Or  (logical  <Sum) 

Exclusive  Of  (logical  Sublract) 
And  (logical  Product) 

Compare 

One  s Complemenf  Accumulator 


3.30 
3.30 
3 30 
3.30 
1.66 


SHIFTS 


ARS 

AIS 

ARO 

AID 

LRS 

US 

IRD 

LLO 


0260(40  fSH) 
0260SN 
0261(40 -t- SH) 
0761SN 
0262(404- $H) 
0262SN 
0263(40+ SH) 
0263SN 


Arithmetic  Right  Single 
Arithmetic  Lett  Single 
Arithmetic  Right  Double 
Arithmetic  Lett  Double 
logical  Right  Single 
logical  left  Single 
Logical  Rigni  Double 
logical  Lett  Double 


See  Note  (1)  Below 


CONTROL 

SMP 

RMP 

SSP 

SSN 

NOP 

P 

T 


024440 

Set  Multiple  Precision  Bit.  Reset  Carry  Borrow  Bd 

1.66 

0244X 

Reset  MuU-ple  Precision  Bd 

166 

0264X 

Set  Sign  o(  Accumulator  Positive 

1.66 

026440 

Set  Sign  of  Accumulator  Negative 

1.66 

027400 

No  Operation 

1.66 

02$— 

Pause 

166 

027**« 

Trap 

1.66 

INTERRUPT 

SMI 

RMI 


024640 

024600 


Set  Master  litterrupt  Bit 
Reset  Master  Interrupt  Bit 


1.66 

1.66 


PROGRAM  PROTECT 
SP8 
RPB 


020440 

020400 


Set  Protect  Bit 
Reset  Protect  Bit 


3.x 

3.x 


EXCHANGES 

EX 

n 

E$ 


026600 

026640 

026600 

026700 


Exchange  Accumulator  and  Index  Register 
Exchange  Accumulator  and  Q Register 
Exchange  Accuniulalor  and  Program  Counler 
Exchange  Accumulator  and  Program  Status  iNord 


1.66 

1.66 

166 

1.66 


U 

U 

U 

CC-AUi  16 


JUMPS  i SKIPS 
J 
L 

SSW 


SNU 


04EDD0 
06EOOO 
0234+ SW 


027417 


Jump  UnconditiOTMl 
link 

Skip  on  Sense  Switch 
SWA. 200.  F«l% 

B. lOa  F-  4. 

C.  40,  G-  2. 

0-  20,  H.  1. 

Skip  Unconditional 


1.66 

3.x 

2.476 


2475 


*C«0. 1. 4 or  6 only.  ••u  • Condition  Code  unchanged.  1.  2.  3.  4.  6.  valid  class  of  Skip  on  Condition  Coot  msiruaions. 
COOO'Fowr  Octal  digits  representing  Effective  Address  option  a.nd  Displacement  Addrtu 


•••0  to  377, 


SH  .Shift  Count 


Note(l) 

Odd 

Even  7+  7S  * number  ol  m«c  Qseconds 

Where  N«  number  of  positions  shifted 


1 + 1 66  . number  of  mK  oseconds 


FUNCTIONAL  Llb'TIT^G  OF  640  INSTRUCTIONS  (Continued) 





k'NFMUNlC  coot 


SKIP  ON  K£GIST£R  CONDITION 

icx 

OCX 

ShN 

SKP 

SAt 

SQE 


INSTRUCTION  DESCRIPTION 


Inc/em^nl  Indei  and  SVip 

Decfe  f’P'^t  Indei  and  Si^D 

SVip  ii  4;-.ufnula?of  Negative 

Sk  p it  Accumulator  Poidiwt 

Skip  it  Accitmulator  Evert 

Skip  it  0 Register  Even 

XXX-»  positive  number  1 to  377i 

XXX-  negative  number  - I to  - 400, 


TIME  IN 

MICROSECONDS 


CONDITION 

CODE** 


I Class  3 Valid  fc!io«ving  OR.  XOR  and  AND 

I 

SNZ 


Class  4 Vaf.d  foiiow.ng  C 
SE 
SG 
St 
SNE 
SGE 
StE 


Class  S Val  d tolipwmg  AlS  and  AID 
SO 
SNO 
SAO 
NAO 


INPUl  OUTPUT 
Dl 
DO 
Rl 
RO 
Of 
SI 
TTI 
TD( 


t was  Zero 

t was  Plus 

t was  Minus 

t caused  Overflow 

I was  Not  Zero 

t was  Not  Pius 

t was  Not  Minus 

I did  Not  cause  Overflow 

t was  Plus  or  Zero 

I was  Minus  or  Zero 

t was  Zero  or  caused  Overflow 

t was  Plus  or  Minus 

I was  Plus  or  caused  Overflow 

t was  Minus  or  caused  Overflow 


Skip  If  result  caused  Overflow 
Skip  'f  result  did  Not  Cause  Overflow 


Skip  if  result  wa:  Zero 
Skio  it  result  was  not  Zero 


Skip  if  Operands  Eoual 

Skrp  if  Accumulator  was  Greater 

Skip  if  Accumulator  was  tess 

Skip  if  Accumulator  was  Not  Equal 

Skip  it  Accumulator  was  Greater  or  Equal 

Skip  it  Accumulator  was  Less  or  Equal 


Skip  if  result  caused  Overflow 
Skip  if  result  did  Not  cause  Overflow 
Skip  ;t  result  IS  About  to  Overflow 
Skip  it  result  IS  Not  About  to  Overflow 


Data  Input 
Data  Output 
Record  Input 
Record  Output 
Device  Eundion 
Status  Inout 
Timer  Channel 
Test  Device  Interrupt 


0066DN 

Cfiannel  Zero 

00670N 

Channel  Orte 

oomN 

Channwl  Two  or  DMAC  Devices 

00/IDN 

Channel  Three  including  Teletype 

00?13?ON 

Direct  Memory  Access  Controller 

TAI 

007ZDN 

Alarm  Channel 

BRI 

000(4- P)DN 

Buffered  Device  Input 

6 6-4  1 6S  per  word 

6RO 

00?(4 . P)DN 

Buffered  Device  Output 

6 6f  1.6S  per  word 

BDF 

OOS(4+P)ON 

Puhered  Device  function 

3.10 

BSI 

004(44  P)0N 

Buffered  Status  Input 

3.30 

o n o o 


F 


► 


i'  • /il 


LUj  I 


PAGE  1 C FPNET 

WIS  PRCGRAM  IS  A PIXED  POINT  VERSION  OF  PNET  DESIGNED  TO  WORK 
ON  IKE  640  USING  IHE  SCALED  FRACTION  FIXED  POINT  WORD  FORMAT. 


DIMENSION  X(50) , W ( 1 5 , 8) , I FMC40) , ALPHA ( 50) , N 1 C5C) , NT(50) , NW{ 50) 
I,  SCALE( 16) .YSAVECR) ,EABS(50) 

DOUBLE  PRECISION  W'D.YD 
DIMENSION  WD(I5,7) 

SCALED  FRACTION  X, W , WT, Y , YTRUE, EABS 
INTEGER  ALPHA 
UGICAL  SENSW 
COMMON  /SHIFT/ISFT 

READC6, 12) NPAR,NREC,N,IWAIT,ISFT 

12  F0RMAT(5I5) 

READ(6, 16) (SCALECI ) ,I :1 , 16) 

16  F0RMAT(8F10.4) 

READ(6,13)  (IFM(I) ,I :1 ,40) 

13  F0RMATC40A2) 

CALL  WEIGHT( NELEM, ALPHA , N 1 ,N2, NW,W) 

IF(SENSW(2))WRITE( 1,74) 

1F( .N0T.SENSWC2) )WRITE( 1,73) 

73  FORMATdH  1 6X,  36HSI NGLE  PRECISION  ELEMENT  COMPUTATION///) 

74  FORMATdH  1 6X,  36HD0UELE  PRECISION  ELEMENT  COMPUTATION///) 

DO  80  L00P:1  NREC 

READ(6,I FM)  (X(I) ,I  :1  ,NPAR) ,YTRUE 

YSAVEd)rYTRUE 

Id 

X 79  IS:I  ,4 
ISFT=1S-1 
X 60  IN:1  ,6 
I =1  + 1 

NBITS:18-(2fIN) 

CALL  ASSGN(NBITS) 

CALL  YNET(X,NPAR,NELEM,ALPHA,N1 ,N2 ,NW,W,Y,NBITS) 

CALL  DNET< X,NPAR, NELEM, ALPHA, N I , N2 , NW, WD , YD, NB I TS, ISFT.W) 

CALL  STATSdS, LOOP, I N, NELEM, NREC, W,WD) 

60  CONTINUE 

79  CONTINUE 

80  CONTINUE 
STOP 

END 


PAGE  I C STATS 


SUBROUTINE  STA TS ( 1 SFT, LOOP, NDI TS, NELEM , NREC, W, WD) 

DI MENSION  AAE(6,4,4),MXAE(6,4,4),AAVT(6,4,4),AOV(C,4,4),MXAVT(6,4. 
14) ,MNAVT(6,4,4) ,MXA VC6,4, 4) ,MNAV(6,4,4) , 1 ST ( 4 ) , I F I N ( 4 ) , 

2 Wd5,8),WDd5,7),0UT(15,2),ERR0R(l5),SUM(  4) 
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o o o o o r>  o 


E 


kitted- 


. • t . 1 I •'  it  M 1 . 1 t v<  i W 

nCUnUE  PRECISICN  UD 

REAL  AAE,MXAE,AAVT,MXAVT,KNAVT,MXAV,MNAV 
LCGICAL  SENSW 
CCr.MCN  /SAV/  CUT 

DATA  IST(  1)  ,1F1N(1)  ,IST(2),1FIN(5>)  ,IST(3)  ,IF1N(3)  ,IST(4),1FH(4) 
1/1  ,8, '5, 12, 13, 14, 15, 15/ 
rREC;FlCAT(NREC) 

SUK( 1 ) =EREC*R. 

SUP!(2)  :DREC*4. 

SUr(3):i;REC*2. 

SUI<U4):DREC 

INITIALIZE  ALL  ARRAYS 

IFUSFT.NE.I  )GC  TC  15 
IF(  LCCP.NE.DGC  TC  15 
IFCNBITS.NE.DGC  TC  15 
DC  70  I rl  ,6 
DC  6?  U1  ,4 
DC  6B  K:1  ,4 
AAE(1,L,K):0. 

AAV(1,L,K) :0. 

KXAE(I,L,K):0. 

AAVTd  ,L,K):0. 

MXAVCI ,L,K) :0. 

WAVT(I,L,K):0. 

68  CCNTINUE 

69  CCNTINUE 

70  CCNTINUE 
15  CCNTINUE 

CCNVERT  SCALED  FRACTICN  AND  DCUBLE  PRECISICN  ELEMENT  CUTPUTSTC 
CCMMCN  MCDE  AND  CCMPUTE  ABSCLUTE  ERRCR. 

DC  12  I =1 ,NELEM 
CUT(I,1):W(I,7) 

CUT(l,n=ABS(CUT(I,m 

IF(NBITS.EQ.1)CUT(1,2):WD(I,7) 

IF(SENSW(7))CUT(I,2):WD(I,7) 

CUT(I,2):ABS(CUT(I,2)) 

E:CUT(I,1)-CUT(I,2) 

FHRCRn)rABS(E) 

12  CCNTINUE 
C 
c 

DC  25  L:1  ,4 
lAiISTCL) 

1B:1FIN(U 
DC  20  1 :I  A,IB 

AfiE(  •;BITS,L,ISFT)  :AAE(  NBITS,L,ISFT)+ERRCR(I) 

1F( ERRCR ( I) ,GT.MXAE(NBITS,L,ISFT))MXAE(NBITS,L,ISFT) :ERRCR(I) 


PAGE  *2  C STATS 

AAVT<  NBITS,L,1SFT):AAVT(NB1TS,L,ISFT)+CUT(1,2) 
AAU(NniTS,L,ISFT)rAAV(NBITS,L,ISFT)+CUT(l  , 1 ) 

1F(CUT(I,2) .GT.MXAVTC  NBI TS,L, ISFT) ) MXAVT( NB1TS,L, ISFT) :CUT( I ,2) 
1F(CUT(I ,1 ) .GT.MXAVI NBI TS,L,  ISFT) ) MXAV (NBI TS,L , I SET) zCUTd , I ) 

C 

20  CCNTINUE 
25  CCNTINUE 

I F( .NCT.SENSW(R) )GC  TC  191 

V'RI  TEd  , 1 89)  AAF(  NPl  T3,  1 ,ISFT)  ,rxAF(f'"l  TS.  I ,1SFT) 
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I 

K>1 

C 

C 


I 60 


161 


28 

2S 

30 

C 

C 


59 


49 

50 
52 


51 


53 


54 


56 

80 


r 1 t.  r il 
CONTI  NU1£ 


. r I > . j / 


’.I Ml 


IF(  U’.CP.NE.tJREC)  30  TO  60 
1F(1SFT.NE.4)  GO  TO  60 
IF(  \'BITS.,\E.6)  GO  TO  60 
rO  30  I :1  ,6 
DO  29  K:1  ,4 
DO  28  L:  1 ,4 

ft4E(l  ,L,K):fiAE(l,L,K)/SUM(U 
1F(A4E(I,L,K) .EQ.O,)GO  TO  160 
AAECI  ,L,K) r20.*AL0G 10(AAE(I ,L,K)  ) 
CONTI NUE 

1F(  ’^AECI.L.KI.ER.O.IGO  TO  161 
fWAEM  ,L,K)  :20,*AL0G1  0(  ^'■(AEn  ,L,K)  ) 
CONTI NUE 

AAVTCl ,L, K) :AAVT(I ,L,K) /SUn(L) 

AAVa  ,L,K):AAV(I  ,L,K)/SUr,CL) 

CvNTI  MJE 
CONTI NUE 
CONTI NUE 


DO  81  ISF:1 ,4 
NSFT:1SF-1 
WRI TEC  1,59)  NSFT 
F0RMATC///6HSHI  FT  ,15/) 

DO  80  1L:1,4 
WRI TEC  1 ,49) 

F0RMATC//18HAUG  ABS  ERROR  /) 

WRITEC 1 ,50) IL, CAAEC  N3,IL,ISF) ,NB:l ,6) 
FORMAT^  5,5X,6F10.5) 

WRI TEC  1 ,52) 

F0RMATC//13HMAX  ABS  ERROR  /) 

WRI TEC  1 ,50) IL, C rXAEC  NB, 1 L, I SF ) , NE 1 1 ,6) 
WRITEC1,51) 

F0RKATC//18HTRUE  AVG  ABS  VALUE/)  . 

WRITEC 1 ,50)  IL, CAAUTC  NB,IL,ISF) ,NB:1 ,6) 
WRI TEC  1 ,53) 

F0RMATC//15HAVG  ABS  VALUE  /) 

WRITEC  1,50)  IL,  CAAVCNB,IL,ISF)  ,’:B:1  ,6) 
WRI TEC  1 ,54) 

F0RMATC//I8HTRUE  XAX  ABS  VALUE  /) 

WRITECI ,50)  IL, C XXAVTCNE,IL,1SF) ,NB=1 ,6) 
WRI TEC  1 ,56) 

F0RMATC//13H1*:ax  ABS  value  /) 

WRITEC  1,50)  IL,  C'iXAV(NB,lL,lSF),NB:l  ,6) 
CONTI NUE 


r*  Ml 
« K ■ 


/ 


PAGE  3 C STATS 

81  CONTINUE 
C 

60  RETURN 
END 


PAGE  1 C DNET 


SUBROUTINE  INET  C X , NPA  R , f.  ELPr . ALPl,  A . N' I 


k O 


■ :V.wr.YD,M.lSFT.W' 
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I i . I ....1  v-i  Au  ; ,,iLi  I.  . u ; , , I u j , u u , W(  n ,i  ) , '*  , 

1 XD(50)  ,WD(  i:j,7) 

SCALED  FRACTION  X,  Xf",  MASK , W , WM 
real  XR,WR,RSFT,R0U\D 

nOUBLE  PRECISION  XD,  WD  , X I D,  X2D , WTD .TPELEM , DSFT , YD 
INTECER  ALPHA, letter 
DATA  LETTER  /IHX/ 

C 

C MASK  X'S  AND  WS  AND  DEFINE  DP  X’S  AND  WS. 

C 

DO  20  1-1  ,NPAR 
XM:X(1) 

IF(  '-.ED.ISIGO  TO  28 
XMrMASK(XM) 

28  XR-XM 
XD(I ):XR 
20  CONTINUE 
C 

DO  2 6 N:|  ,6 

DO  25  1JK:| ,NELEM 

WM:W(IJK,N) 

IF(  M.EQ. 16)00  TO  19 
VM:MASK( WM) 

19  WRrWM 

25  '«D(IJK,N):WR 

26  CONTINUE 
C 

C COMPUTE  NETWORK 
C 

DO  30  IJKrI.NELEM 
I :N1  (UK) 

J:N2(1JK) 

IFCALPHACIJK) .NE. LETTER)  GO  TO  15 
XI DrXD(I) 

X2  D:XD(J) 

GO  TO  16 

15  X1D:WD(I,7) 

X2  D:WD(.1,7) 

16  NiNW(IJK) 

DO  IB  NT:1,N 
18  WTD( NT):WD(IJK,NT) 

WD(UK,7):DPELEM(X1  D,X2D,N,WTD) 

RSFT:FLOAT(ISFT) 

IFdSFT.EQ  ,0)G0  TO  30 

RSFT:2.**RSFT 

DSFTrRSFT 

VD(IJK,7)rWD(l JK,7)*DSFT 
30  CONTINUE 

YD:WD( NELEM.7) 

C 

C ROUND  ALL  'lEMENT  OUTPUTS  TO  N BITS. 

C 

R0UNDr2.’*'*(-M-l) 

D0  40  IJK  = I .N'ELEM 
'*«  :WD(UK,7) 


PAGE  2 C ONET 

IF(  M.EQ.I6)G0  TO  36 
V«:WR-fROUND 
36  CONTINUE 

IF(WR.6E..99990)WR:.99590 

IF(WR.LE.-.99990)WR:-.93990 

WMrWR 
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I 


- 1 ..  ..  r ( ) 

W(IJK,6)rWM 
40  CONTINUE 
C 

RETURN 

EJID 


PAGE  1 C READ  SCALED  FRACTION  WEIGHTS 


SUBROUTINE  WEI  GHT  ( NELEr,  ALPHA , *j  j , f.2,  ‘ W,W) 

C 

C THIS  SUBPROGRAM  READS  IN  A SCALED  FRACTION  NET 
C 

DI  MENSION  ALPHA ( I ) ,W ( 1 5 , 7) , N 1 ( 1 ) , N2 ( I ) , NW ( 1) 

SCALED  hRACTION  W 
INTEGER  ALPHA 
LOGICAL  SENSW 
C 

READ(6,II)  NELEM 

11  FORMAT(155 

CO  iO  UK:  I .NELEM 

READ<6, 12)ALPHA(I  JK)  , N I ( UK)  , N2  ( IJK)  , N , ( W ( UK , I ) , I : 1 , N '• 

12  FORKATCAl ,12,1X,2I2,6S10) 

NW(UK):N 

1F( .N0T.SENSW(5) ) GO  TO  W 

WRITE<  I , 13)  UK,  ALPHA  (UK)  , N I ( IJK ) , N2  ( UK ) , N’WC  UK ) , ( V(  UK , I ) , I : 1 . f 

13  FORMATdH  I 5,  3X,  A 1 , 3 1 4 ,2X,  6S7/) 

59  CONTINUE 

30  CONTINUE 
RETURN 
END 


L V Ai  LMl/£.L  C'Ji 


'•/ 


PAGE  1 


YNET 


r • 


SUBROUTINE  YNETCX, NPAR, NELEM, ALPHA , N 1 , N2, NW, W , y ,M) 

C 

C YNET  PERFORMS  THE  TRANSFORMATION  Y:X  * W 
C 

DIMENSION  X(l  ) .ALPHA  ( 1 ) , M ( I ) , N2  ( 1 ) , NW(  I ) , W(  I 5 , 8 ) , WT  C f. ) 
INTEGER  ALPHA, LETTER 

SCALED  FRACTION  X, W , WT , X 1 , X2 , YELEM, Y , YELEMl 
UGICAL  SENSW 
DATA  LETTER/IHX/ 


DO  30  UK:1  .NELEM 
I :M  (UK) 

J:N2(UK) 

IF(ALPHA(UK)  .NE. LETTER)  GO  TO  15 
XdXd) 

X2:X(J) 

G,/  TO  16 

15  XI:W(I,7) 

X2:W(J,7) 

16  N:NW(UK) 

DO  18  NT:| ,N 
18  WT(  NT):W(UK,NT) 

W(UK,7):YELFr(XI  ,X?.'..WT) 
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J r \ ; 


* 


. V <■  .1  J . W .Jh  , I ^ 


c 

C ShIKT  TVlE  ELEMENT  OUTPUT 


C 


CALL  SET(  WUJK,  7)  ) 
30  CONTINUE 
Yrwn  JK,  7) 

RETURN 

END 


J.I..  / I C A J I , U 1 f I'D 

--INTERMEDIATE  SHIFT .C 


PAGE  I C ASSIGN  'ASK 


SUDROUTINE  ASSGNC  N) 

COMMON  -SK 
INTEGER  '(16) 

DAT'.  'K  1 ) ,M(P)  ,M(i)  ,M(4)  ,MC5)  ,M(6)  ,M(  7)  ,M(R)  ,N(9)  ,M(  10)  ,f1(  I i ) , 
I M( IP) ,M( 13) ,M( I A) ,MU  5) ,M( 16)/- '00000, 

1 -’40000, -'POOOO, -'10000, 

2 - '04000,- '02000,- '01000, 

3 - '00400,- '00200,- '00100, 

4 - '00040,- '00020,- '00010, 

5 - '00004 ,- '00002,- '00001 / 

MEK  :M(  •;) 

RETURN 


PAGE  I C MASK  A SF  TO  DESIRED  BITS 


FUNCTION  MASK(X) 

SCALED  FRACTION  X 
SCALED  FRACTION  MASK 
INTEGER  MSK 
COMMON  MSK 
C 

C USE  LOGICAL  AND  FUNCTION  TO  ’'ASK  N BITS  OF  WORD  X 
C 

LA  X 
AND  MSK 
ST  A ''ASK 
C 

RETURN 

END 


PAGE  1 C YELEM 


C 

C 

C 


FUNCTION  YELEM(X1 ,X2,N,WT) 

DIMENSION  WT(6) 

SCALED  FRACTION  X 1 , X2 , WT , Y ELEM , TEMP, XI S , X2S, X 1 2 , T 1 , T2 , T3 , T 4 , T3  . T6 
SCALED  fraction  MASK 
LOGICAL  SENSW 

external  mask 
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'ULJt  i V tf  Li  i..  i 

- : .,11  i>i>u  .» ' It,  .Ji  Li^  1 1 , 1.1/  i.ii;/ 

C 

XI  :I';>.SK(XI  ) 

x?:i*:.'\SK(xf;) 

DC  ?.0  K:1  ,N 
TP:mP:WT(K) 

20  WT( K) :MASK( TKMP) 

1F(  ;J.E0.6)GC  TC  22 
WT(5):.OOOOOS 
WT(6) : .OOOOOS 
?2  CCMTl NUE 
C 
C 
C 

C CCWPUTE  X SQUARES  , CROSS  PRCD.A.MD  ,'ysSK. 

c 

XlSrXl*Xl 
X2S:X2»X2 
XIP:X1»X2 
X1S;MASK(XI S) 

X2S:(»1ASK(X2S) 

XI2:MASK(XI2) 

C 

C multiply  W'S  and  X’S  and  r'ASK 

c 

Tl  :i/T(  1 ) 

T2rWT(2)*Xl 

T3:WT(3)*X2 

TA:WT(4)*X12 

T5:WT(5)*X1S 

T6:WT(6)*X2S 

Tl  :MASK(  Tl  ) 

T2:MASK(  T2) 

T3:MASK(  T3) 

T4:MASK(  T4) 

T5rMASK( T5) 

T6rMASK(  T6) 

C 

YELEM:T1+T2+T3+T4+T5+T6 

RETURN 

END 


PAGE  1 C YELEMI 


EUNCTICN  YELEMI (XI ,X2,N,WT,M) 

DIMENSION  WT(6) 

SCALED  FRACTION  X 1 , X2 , YELEMI , WT ,SUMA , SUM' , T2 A , T2 Q , T3 A , T3Q , 
1 TAA,TAQ,T!>A,T5Q,T6A,T6C, ROUND 
SCALED  FRACTION  MASK, TEMP 
UGICAL  SENSW 
EXTERNAL  MASK 
C 

C MASK  ALL  W'S  AND  X'S. 

C 

XI:MASK(xn 
X2:MASK(X2) 
ro  20  K:1 ,N 
TEMP:WT(K) 

20  WT(K)rMASK( TEMP) 

IF(  N.EQ  .f  »G0  TO  22 
WT(^)r.OOOOS 
’>'T(S)  r.OOQQS 


oooo  ooo  ooooo  oooo 


t k 


MULTIPLY  X’S  AND  WS  TC  FORM  THE  6 TERMS.  MU1.T2  SAVES  THE  2T  BIT 
PRODUCT  IN  THE  LAST  2 ARCS.  SAME  WITH  MULT3. 

CALL  MULT2 (WT(2) ,X1 ,T2A,T20) 

CALL  MULT2(WT(3) ,X2,T3A,T3Q) 

CA  LL  ^'UL T3  ( WT  ( A ) , X 1 , X2 , T 4 A , T AR  , M ) 

CALL  ‘■:ULT3(UT(5)  ,X1  , XI  , T5A  , T 5R  , M) 

CALL  MULT3(WT(6)  ,X2,  X2 , T6A  , T60 , M) 


X / 


hbl 


SUMA:WT( 1 ) 
SUMQ:.OOOOS 


ADD  TH;E  6 DOUBLE  PRECISION  (2T)  TERMS  AND  STORE  DP  RESULT  IN  SUMA.SUM' 

CALL  DPADD(SUMA,SUMR,T2A,T2Q) 

CALL  CPADDCSUM- ,SUMR,T3A,T3R) 

CALL  nPADD<SUMA,SUMR,TAA,TAQ) 

CALL  LPADD(SUMA,SUMQ,T5A,T5>Q) 

CALL  DPADD(SUMA,SUMQ,T6A,T6C) 

ROUND  TO  T BIT  POSITIONS  AND  MASK. 

CALL  RNDCSUMA.SUMQ.M) 

YELEMI =SUMA 

RETURN 

END 


SGE  1 C C DPELEM 


FUNCTION  CPELEMCDXl ,DX2,M,DWT) 

DIMENSION  DWT(6) 

DOUBLE  PRECISION  DX 1 , DX2, CWT , DX IS, DX2S, DX 1 2 .DPELEM 


! 


I 


f 


PERFORM  ELEMENT  COMPUTATION  IN  DOUBLE  PRECISION. 

DXlSrDXl^DXl 
DX2S:DX2*DX2 
DX12rCXl*  DX2 
C 

DPELEMrCV'TC  1 ) + DWT (2  )*DX  l + DWT(3 ) *DX2+CWT  ( A ) *DX  1 2 
1 +DWT(5)*DX1S+DWT(6)*DX2S 
RETURN 
END 


PAGE  I C MULT2 


SUBROUTINE  MULT2(A,B,C1 ,C2) 
SCALED  FRACTION  A,B,C1,C2 
C 

c 

U A 

N B 
RTA  Cl 
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C 

C 1*1  IS  THE  NUM  CF  CCKPUT.MI  GNAL  BITS  AVAILABLE. 

C THE  PRCD  CF  ’A'  AND  ’B ' ARE  RCUNDED  TO  K BITS  BEFORE  MULTIPLYING  C 
C 

RR:2.0**(-M-1 ) 

RCUKD:RR 

IF(M.NE.I6)GC  TC  21 
n :A*B 
GC  TC  22 
C 

21  CONTINUE 
LA  A 

!•)  B 
A ROUND 
STA  D1 
C 

D1  :l*!AS|lt{  D1  ) 

22  CONTINUE 
C 

LA  Dl 
M C 
STA  Dl 
OCT  026540 
STA  12 
C 

RETURN 

END 


PAGE  I C tPADD 


SUBROUTINE  LPADD(A1 ,A2,B1 ,B2) 

SCALED  fraction  A1,A2,B1,B2 
C 

C DPADD  PERFORMS  A DOUBLE  PRECISION  ADD  OF  A AND  B , RESULTS  STORED  I 

C 

C 


LA 

A2 

A 

B2 

OCT 

026357 

A 

Al 

A 

Bl 

STA 

Al 

OCT 

026740 

OCT 

026317 

STA 

A2 

C 


O. . i.  mV.) 


STA  C2 
RETURN 
END 


iV. 


MULT3 


SUBROUTINE  MULT3(A,B,C,DI ,D2,M) 
SCALED  FRACTION  A ,B , C , P 1 , P2 .ROUND 
SCALED  fraction  MASK 
LOGICAL  SENSW 
*D(TERNAL  MASK 
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► 


P ' ~ ^ r - . r ^ - 


PAGE  1 C 


C 

C RCUPD 

SUBROUTINE  RNDCAl  ,A?,t.) 

SCALED  FRACTION  Al, ROUND  ,MASK,A2 
EXTERNAL  rASK 
C 

C RND  WILL  ROUND  A1  TO  N BITS  AND  MASK  LEAST  SIG  BITS  WITH  7EROS. 
C 

1F( N.EG.16)G0  TO  20 
RR r2.0**(-N-l ) 

ROUNDiRR 
A I :A  1+ROUND 
A1:MASK(A1  ) 

RETURN 

C 

20  CONTINUE 
LA  -A2 
OCT  02,6256 
A Al 
STA  A I 
RETURN 
EXD 


PAGE  1 C SFT 


SUBROUTINE  SFT( VALUE) 

C SFT  PERFORMS  AN  ARITHMETIC  LEFT  SHIFT  ON  ALL  ELEMENT  OUTPUTS. 
C 

SCALED  FRACTION  VALUE 
REAL  VAL.A 
COMMON  /SHIFT/ISFT 
C 

IFCISFT.EQ ,0)G0  TO  10 
C 

C CHECK  FOR  OVERFLOW 
C 

VAL:  VALUE 
SiFLOAT(ISFT) 

ArVAL*2.**  S 

IF(A ,GE. .S999) VALUE:.9999S 
IF<A .LE.-.99R9) VALUE:-.9999S 
A:ABS(A) 

IFCA .GE. .9999)G0  TO  10 
GO  TO  (1,2,3,A,5),ISFT 

1 CONTINUE 
Li  VALUE 
OCT  026001 
STA  VALUE 
GO  TO  10 

2 CONTINUE 
LA  VALUE 
OCT  026002 
STt  VALUE 
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‘ *1^  (V/  t U 

3 C:,NT1NUE 
LA  VALUE 
CCT  0PI600S 
STA  VALUE 
GC  TC  10 

4 CONTINUE 

LA  VALUE 
CCT  026004 
STA  VALUE 
GC  TC  10 

5 CONTINUE 
Ls  VALUE 
CCT  026005 
STA  VALUE 

10  RETURN 


XB  CORRECT 


APPENDIX  J 


An  algorithm  is  derived  below  to  estimate  recursively  the  mean  and 
the  sum  of  the  feature  variances  of  patterns  from  a particular 
class. 


The  symbols  have  the  following  interpretation: 
y - n-dimension  pattern  vector. 

A-  L. 

- centroid  at  the  N iteration. 

- estimate  of  the  element  of  ^ at  the 
iteration. 


A 


N 


1 

'sample  covariance  matrix  of  y at  the 
iteration. 


j,N 


th 


sample  variance  of  yj  at  the  N iteration. 
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# 


N 


Now 


f ol lowing 

i s 

a procedure  by 

which  we 

can  estiniate  recursively  the 

and  covar 

i ance  mat  r i X of  y 

The  sample 

covariance  matrix  of  y 

at  the  n'’^ 

stage  is, 

r 

N+1 

, N+1 

N+1 

) 'U 

,(»)  x< 

y^ -u^ ) (y2 

u.  2 ; . • 

/(yrui)i(yn-^n 

i=1 

i=1 

i = 1 

N+1 

N+1 

N+1 

Ui  )(y2"i^2^^i  ilC 

• 

y({y2-v^2)(yn'^n) 

i = 1 

i = 1 

1 

i = 1 

N+1 

• 

N+1  , 

n) 

(yr^iOi  . 

• 

• 

y -U 
' n nx  1 

i = 1 

i = 1 

V 

n 

N+1 

n 

"’’'■^N+l  = 

V 

L-t 

^ (y.-p..)? 

V'  2 

= ) 0 . 

J > 

N*r<N*') 

(la) 

j=l 

i = 1 

j = 1 

n 

N 

N o 

N Tr  A|^  = 

I 

I 

= S a . 

^ J . 

N-" 

(1b) 

j-' 

i=l  ' 

j = 1 

n 

Vi  = 

I 

(2a) 

j 

= 1 

^ 9 

Tr  A,  . 

L J ,N 

(2b) 

j 

= 1 

N + 1 

/■ 

\ 

I vj<n 

s 

”j.N+1 

'-"j.N+1 

) 

(3) 

1 “1 
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I y-  (i) 


/ 2 


j .N^ 


Subtracting  (i- ) ano  (3) 


yj.N+1 


= (N+1) 

\ 


r 2 2 ■■* 

■ ^ c°j.N  ^ "j,N  ^5) 


dropping  the  subscript  j 

yN+1  ^ (N+1)  V^N+i  + ‘ N 


2 _ rJL.^  r 2 ^ 2^  yN+1 

°N+1  “ VN+iy  ^''n  ■*■  N+1 


fjis\  ’^N+ 
\N+1  / ^N  N+r 


\ 2 yN+1  ^ 1_N 

Ulj  “N  (N»l)^ 


Subs t i tut i nq  in  ( 7 ) 


2 

r N ^^.2  ^ 2A  ^ yN+1 
= VRTT..^  ^^N  ^^N^  N+1 

2 

- ^N+1  2 N 


0 


.( JL  \ . (N)  n 

V N+1  -'N  (N+1  )2^N+1  (N+1 


^N+1  C N ^ 2N 
■*'N+1  Wiy  (N+1)^  ^ 


=Cn?t)  °n  (vn+1  • 


(N+1)  c'o  , - NOf,  + Cn+i)  ^I'n+I 


The  (N+1  )st  estimate  of  of  is  related  to  the  estimate 
.2  J 

of  Oj  through  the  above  relationship.  Thus: 


J ,N+1 


^ N ^ 2 N / \ ^ 

"VN+iy  ""j,N  "^(N+U^  (^N+l 


^ " I °j  ,N+1  “C  N+1  j 1 ,N  N+1  i ,1 
j=1  j=1  j=1 


m) 
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This  can  be  set  uo  on  the  comouter  with  an  initial 

n 

' a . - =1  or  0 

J .0 

j = 1 

The  ratio  of  the  traces  of  the  sampled  feature  variance  matrices 
of  the  two  classes  can  be  estimated  recursively: 


Tr(A^_^1  ) = (^) 


n!i  I ^yj.N+l  ■ ^j.N 
j = 1 


r 


(16) 


and  : 


, „ i_1L\  . . . IL.N+1 

j ,N+1  'n+1  ' ,N  N+1 


(17) 


'N  \'j,N 


J 


{“j.N 


1 

J 


t J “ 


✓ 

°C  N+T j Tr  (A^)  +(^  ^^N+l' 


and 


'N+1 


= (N/N+1)  + 


>^N+1 

N+1 


(19) 
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e ' ! ^ 
/' l V i 


APPLICATION  PROGRAM 


DIMENSION  WP3(  ■i,  6 ) - y?2(  j,  'n  , JJ'Jf  ( 3 ) , T3UF  (15),  J3'JF  C 4 ) , 3AT  4(13) 
JIM5M3I0N  DATIN(13) 

1 J"UT£(1,5) 

3 r 03:3AT("'’’JT  7£I3;iT  TAP£  IM  3£AJ£P  AND  P3£SS  TJN"  ) 

PAJ3£ 

P£AD( 5,  * ) ( ( JP3( I , J ) , J= 1 , 5 ) , I = 1 , 9 ) 

"1£AD(  5,  * ) ( ( y’2(  I , J ),  J = 1 , 4 ) , I = 1 , 5 ) 

9 WII T£( 1 , 1 3 ) 

13  FORMAT  ( "P7T  DATA  T.tP£  IN  2£AJ£3  AND  P2£33  27N") 

PAJ5£ 

2£  AD  ( 5 , * ) ( DAT  I N ( :-l  ) , 3 = 1 / 1 3 ) 

DATA( 1 )=DAT IN( 1 ) 

DATA(2)=DATIN(2) 

DATA(3)=DATIN(5) 

DATA( 4 ) =DATIN( I 3) 

DATA(5)=DATIN(7) 

DATA ( 5 )=DAT IN( 1 3 ) 

DATA(7)=DATIN(5) 

DATA(3)=DATIN( 13) 

DATA( 9 ) =DATIN( 5 ) 

DATA( 13)=DATIN(5) 

DO  31  J=l,5 
DO  21  1=1,0 

I ; = 7-  I 

2 1 V3'JF(  I7)=7'’3(  J,  I ) 

JJ’JF(2)  = DaTA(2*J-1  ) 

D3'JF  ( 1 ) = JATA(  2*J  ) 

3 1 3ALL  POLY(  3,D3’JF(  1 ),  J3'JF(  1 ),33'JF(  J)  ) 

DATA( 1 )=33JF( 1 ) 

JATA(2)  = '13'JF('*) 

DATA(  3)  = 33'JF(  1 ) 

DATA(4)  = 33'JF(5) 

JATA( 5 )=33JF ( 3) 

DATA(6)=33JF(5) 

DATA(7)=33JF(2) 

DATA ( 3 ) =33JF ( 3 ) 

DO  32  J=6,9 
DO  22  1=1,3 
17=7-1 

22  W37F ( I7)=VP3( J, I ) 

D3JF(2)=DATA(2*J-1 1 ) 

D33F( 1 )=DATA(2*J- 10) 

3 2 CALL  ?0LY(3,  DB’JF(  1 ),'rf37F(  1 ),  33’JF(  J)  ) 

DO  23  1=1,4 
17=5-1 

2 3 73’JF(  I7)=UP2(  1 , I ) 

DO  231  1=1,4 
17=9-1 

2 31  y3'JF(  I7)=V?2(2,  I ) 

D3'JF(2)=33’iF(7) 

D37F(  1 )=33'JF(9) 
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4 


/ ’ ?? 
wm!:. 


.«  ; 


J3  JFC  'i  ) = 13’JF  ( 7 ) 

D3  JFC  3)  = "l3'JFr:S  ) 

CALL  POLVC  2,  D3’JF  C 1 )>'J3'JF(  1 ) , 33‘JF  ( 13)  ) 
30  232  1=  1 , '1 
17=5-1 

2 32  W37F  ( I ;)=VP2(  3,  I ) 

33JF ( 2) =337F  C j ) 

33’JF  ( 1 ) = P3’JF  < 7 ) 

33'JF(A)=3.3 

D37F(3)=3.3 

CALL  P0LY(2,33JFC  1 )>V3'JF(  1 ),P3  JF(  12)  ) 

30  2A  I = UA 

17=5-1 

2 A 73  JF(  I 7)=yP2(A,  I ) 

DO  2A1  I = U4 
17=9- I 

2 41  V37F( 17)=WP2(5>  I ) 

'F(2)=33UF(  13) 

F ( 1 ) = 33JF (12) 

■ ( 4 ) =H3'JF  (11) 

- r ( 3 ) = P3'JF  (12) 

>.ALL  P0LY(2,33JF(  1 ),W3’JF(  1 ) > P3’JF  ( 13)) 

30  25  I=l>4 

17=5-1 

25  W3UF( I7)=VP2(5, I ) 

D3JF ( 2)=R37F( 13) 

D3'JF(  1 )=R37F(  14) 

D3'JF(4)=0.0 
33JF( 3)=3.3 

CALL  P0LY(2,D3'JF(  1 )^W37F(  1 ) , R3'JF  ( 15)  ) 
J3IT£( 1 , 30)R3UF( 15 ) 

3 0 FORMAT  ( ••RES JLT  = ",E13.6) 

GO  TO  9 
STOP 
END 
ENDS 
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PAGE  0002  #01  SERVICE  ROUTINE  POLY 


0001 

00000 

NAM 

POLYS 

0002 

ENT 

POLY 

0003 

EXT 

. IOC.> .ENTR, .STOP/ENDIO 

0004 

00000 

000000 

ARGS 

BSS 

4 

0005 

00004 

000000 

POLY 

NOP 

0006 

00005 

016002X 

JSB 

• ENTR 

0007 

00006 

000000R 

DEF 

ARGS 

0008 

00007 

166000R 

LDB 

ARGS, I 

0009 

00010 

016043R 

JSB 

DWA 

0010 

0001  1 

032145R 

I OR 

SGN 

001  1 

00012 

072152R 

STA 

D+4 

0012 

000  1 3 

062144R 

LDA 

ARGSA 

0013 

00014 

066 143R 

LDB 

DESTA 

0014 

00015 

105777 

MOW 

TWO 

00016 

000162R 

00017 

000000 

0015 

00020 

062003R 

LDA 

ARGS+3 

0016 

00021 

072154R 

STA 

D+6 

0017 

00022 

016001X 

JSB 

• 1 OC  • 

0018 

00023 

010011 

OCT 

10011 

0019 

00024 

0161  15R 

JSB 

ERRX 

0020 

00025 

000146R 

DEF 

D 

0021 

00026 

B77773 

OCT 

-5 

0022 

00027 

072155R 

STA 

BSCT 

0023 

00030 

016004X 

JSB 

END  10 

0024 

00031 

000032R 

DEF 

♦ +1 

0025 

00032 

016001X 

READ 

JSB 

• IOC* 

0026 

00033 

02001 1 

OCT 

2001  1 

0027 

00034 

0161  15R 

JSB 

ERRX 

0028 

00035 

000153R 

DEF 

D + 5 

0029 

00036 

177776 

OCT 

-2 

0030 

00037 

072155R 

STA 

BSCT 

0031 

00040 

016004X 

JSB 

ENDIO 

0032 

00041 

000042R 

DEF 

*+l 

0033 

00042 

126004R 

JMP 

POLY, I 

0034 

00043 

000000 

DVA 

NOP 

0035 

00044 

056162R 

CPB 

TWO 

0 0 36 

00045 

026055R 

JMP 

TO 

0037 

00046 

056163R 

CPB 

THREE 

0038 

00047 

026066R 

JMP 

TRE 

0039 

00050 

056164R 

CPB 

FOUR 

0040 

00051 

026077R 

JMP 

FOR 

0041 

00052 

056I65R 

CPB 

SIX 

0042 

00053 

026106R 

JMP 

SX 

0043 

00054 

026134R 

JMP 

ERRXl 

0044 

00055 

076153R 

TO 

STB 

AA 

0045 

00056 

066162R 

LOB 

TWO 

0046 

00057 

076153R 

STB 

AA 

0047 

00060 

005000 

BLS 

0048 

00061 

076146R 

STB 

D 

0049 

00062 

005000 

BLS 

0050 

00063 

076147R 

STB 

W 

0051 

00064 

062157R 

LDA 

P2 

0052 

00065 

126043R 

JMP 

DWA,  I 

0053 

00066 

006404 

TRE 

CLB< 

INB 

0054 

00067 

076153R 

STB 

AA 

0055 

00070 

005000 

BLS 

300 


T in 


PAGE  #0  1 


0055 

0007  1 

076  146R 

STB 

D 

0057 

00072 

005000 

BLS 

0058 

00073 

046162R 

ADB 

TWO 

0059 

00074 

076147R 

STB 

U 

0060 

00075 

062160R 

'.DA 

P3 

006  1 

00076 

1 26043R 

JMP 

DWA>  I 

0063 

00077 

005 1 00 

FOR 

BRS 

0063 

00  100 

076147R 

STB 

w 

0064 

00101 

005100 

BRS 

0065 

00102 

076146R 

STB 

D 

0066 

00103 

002400 

CLA 

0067 

00  1 04 

076 153R 

STB 

AA 

0063 

00105 

1 26043R 

JMP 

DWA,  I 

0069 

00106 

066 162R 

SX 

LDB 

TWO 

0070 

00107 

076147R 

STB 

W 

007  1 

00  110 

005000 

BLS 

0072 

00111 

076146R 

STB 

D 

0073 

00112 

076153R 

STB 

AA 

0074 

00113 

06216 IR 

LDA 

P6 

0075 

00114 

1 26043R 

JMP 

DWA,  I 

0076 

00115 

000000 

ERRX 

NOP 

0077 

001  16 

006021 

SSBi 

RSS 

0078 

00117 

026 1 34R 

JMP 

ERRXl 

0079 

00120 

0621  55R 

LDA 

BSCT 

0080 

00  121 

052164R 

CPA 

FOUR 

0081 

00122 

0261 25R 

JMP 

♦ + 3 

0082 

00123 

036155R 

ISZ 

BSCT 

0083 

00124 

0621 15R 

LDA 

ERRX 

0084 

00125 

042156R 

ADA 

M3 

0085 

00126 

124000 

JMP 

A,  I 

0086 

00127 

002400 

CLA 

0087 

00  I 30 

072155R 

STA 

BSCT 

0038 

00131 

062140R 

LDA 

I 

0089 

00132 

066141R 

LDB 

BS 

0090 

00133 

016003X 

JSB 

. STOP 

0091 

00  1 34 

062140R 

ERRXl 

LDA 

I 

0092 

00135 

066142R 

LDB 

UN 

0093 

00136 

026 1 32R 

JMP 

ERRXl -2 

0094 

00137 

077777 

MPOS 

OCT 

77777 

0095 

00140 

000012 

I 

OCT 

12 

0096 

00141 

041123 

BS 

ASC 

1,BS 

0097 

00142 

052516 

UN 

ASC 

1,UN 

0098 

00143 

000150R 

DESTA 

DEF 

D + 2 

0099 

00144 

000001R 

ARGSA 

DEF 

ARGS+1 

0100 

00145 

1 00000 

SGN 

OCT 

1 00000 

0101 

00146 

000000 

D 

BSS 

7 

0102 

00155 

000000 

BSCT 

NOP 

0103 

00156 

177775 

M3 

OCT 

>3 

0 104 

00157 

030000 

P2 

OCT 

30000 

0105 

00160 

020000 

P3 

OCT 

20000 

0106 

00161 

010000 

P6 

OCT 

10000 

0107 

00162 

000002 

TWO 

OCT 

2 

0106 

00163 

000003 

THREE 

OCT 

3 

0109 

00164 

000004 

FOUR 

OCT 

4 

0110 

00165 

000006 

SIX 

OCT 

6 

0111 

00000 

A 

EQU 

0 

0 112 

00001 

B 

EQU 

1 

301 
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0 113 

00153 

AA 

EQU 

D+5 

0 114 

00  147 

V 

EQU 

D+1 

0 115 

00  1 66 

000000 

CTR 

NOP 

0116 

00167 

000 1 70R 

ANSAD 

DEF 

♦ + 1 

0117 

00  1 70 

000000 

ANSBF 

BSS 

8 

0 118 

00200 

000000 

TEMP 

NOP 

0 119 

0020  1 

000202R 

INTAD 

DEF 

*+ 1 

0 1 20 

00202 

000000 

INTBF 

BSS 

24 

0121 
0 1 22 

00232 

000000 

TEMl 

NOP 

END 

* i<  1 

NO  EUnORS# 
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DRIVER  D.66 


PAGE 

0003  1 

»0l  **  BCS 

DRIVER 

D* 66  ★ ♦ 

0025* 

0026* 

INITIATOR 

SECTION 

0027* 

0028 

00000 

000000  D. 

66 

NO° 

ENTRY  4 EX  IT 

0029 

00001 

072242R 

STA 

SAVA 

SAVE  EOT  ENTRY  ADDRESS 

0030 

00002 

076243R 

STB 

SAVB 

SAVE  WD  2 ADDRESS 

0031 

00003 

160001 

LDA 

B^  I 

GET  WD  2 OF  RRQUEST 

0032 

00004 

001700 

ALF 

ROTATE 

0033 

00005 

01 2274R 

AND 

M17 

AND  ISOLATE  RCODF 

0034 

00006 

002002 

SZA 

= 0?? 

0035 

00007 

026015R 

JMP 

D.XX 

NO>  THEN  CONTINUE 

0036* 

0037* 

RCODE=0 

— 

TERMINATE  OPERATION 

0038* 

0039 

000  1 0 

126000R  I. 

1 

JMP 

D.66>  I 

THIS  IS  CLC  AFTER  1ST  OPERAT 

0040 

0001  1 

072242R 

STA 

SAVA 

0041 

00012 

062000R 

LDA 

D.66 

SET  EXIT  OF  CONTINUATOR 

0042 

00013 

072152R 

STA 

1.66 

SECTION  TO  .IOC 

0043 

00014 

026222R 

JMP 

STAT 

NOW  GO  TO  CONTINUATOR  AND  CL! 

0044* 

0045* 

DRIVER  BUSY 

■ TEST 

0046* 

0047 

00015 

066253R  D. 

XX 

LDB 

DFLG 

IF  DRIVER  BUSY 

0048 

00016 

006002 

5ZB 

(DFLG  NOT-0)^  THEN 

0049 

00017 

026077R 

JMP 

REJB 

REJECT  REQUEST 

0050T 

0051* 

TEST  FOR 

ILLEGAL 

, REQUEST 

CODES 

0052* 

0053 

00020 

052270R 

CPA 

B1 

=1  THEN 

0054 

00021 

026025R 

JMP 

WRITE 

WRITE  REQUEST 

0055 

00022 

052271R 

CPA 

B2 

=•2  THEN 

0056 

00023 

026060R 

JMP 

READ 

READ  REQUEST 

0057 

00024 

026076R 

JMP 

REJC 

IF  NEITHER  THEN  REJECT  REQUE: 
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0059* 

0063* 

WRITE  REQUEST 

PROCESSING 

006  1 * 
0062 

00023 

016102R  WRITE 

USB 

SET  10 

0063 

00026 

002404 

CLA, 

INA 

0064 

00027 

072254R 

5TA 

R/W 

0065 

00030 

016  1 33R 

USB 

GETBF 

0066 

00031 

162251R 

LDA 

BUF/  I 

0067 

00032 

001000 

ALS 

0063 

00033 

003004 

CMA> 

INA 

0069 

00034 

072255R 

STA 

DCNT 

0070 

00035 

03625 IR 

ISZ 

BUF 

007  1 

00036 

B62251R 

LDA 

BUF,  I 

0072 

00037 

001 000 

ALS 

0073 

00040 

003004 

CMA, 

INA 

0074 

00041 

072261R 

STA 

WCNT 

0075 

00042 

036251R 

ISZ 

BUF 

0076 

00043 

16225 IR 

LDA 

BUF,  I 

0077 

00044 

072256R 

S"^A 

DADD 

0078 

00045 

03625IR 

ISZ 

BUF 

0079 

00046 

162251R 

LDA 

aUF,  I 

0080 

0004  7 

072262R 

STA 

WADD 

008  1 

00050 

036251R 

ISZ 

BUF 

0032 

0005  1 

162251R 

LDA 

BUF,  I 

0083 

00052 

072250R 

STA 

TYPE 

0084 

00053 

102600  1.3 

OTA 

0 

0035 

00054 

072253R 

STA 

DFLG 

0086 

00055 

002400 

CLA 

0087 

00056 

103700  1.4 

STC 

0,C 

0088 

00057 

1 26000R 

JMP 

D.66,  I 

GO  AND  CONFIGURE  I/O  INSTS 
SET  READ/WRITE 
FLAG  NOT=0 

GO  GET  BUFFER  START  (LENGTH  =5) 
GET  # OF  DATA  WORDS 
MPY  BY  2 TO  GET  # OF  WDS  TO  TRANSFER 
MAKE  NEGATIVE  AND  SAVE 
AS  DATA  COUNTER 
INDEX  TO  NEXT  WORD 
GET  # OF  WEIGHTS 
X2  TO  GET  WDS 

MAKE  NEGATIVE  AND 
SAVE  AS  WEIGHT  COUNTER 
INDEX  TO  NEXT  WORD 
GET  ADDRESS  OF  1ST  DATA  WORD 
AND  SAVE 

INDEX  TO  NEXT  WORD 
GET  ADDRESS  OF  1ST  WEIGHT 
AND  SAVE 
INDEX  ONCE  MORE 
GET  POLY  TYPE 
AND  SAVE 

OUTPUT  COMMAND  WORD 

SET  TO  INDICATE  BUSY  IE.  NOT=0 

SET  FOR  OPERATION  INITIATED 

START  DEVICE 

-EXIT-  BACK  TO  .IOC 
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0090# 
0091  # 

READ  REQUEST 

PROCESSINI 

0092# 

0093 

00060 

016102R  READ 

JSB 

SETIO 

0094 

0006  1 

002400 

CLA 

0095 

00062 

072254R 

STA 

R/W 

0096 

00063 

016133R 

JSB 

GETBF 

0097 

00064 

162251R 

LDA 

BUF,  I 

0098 

00065 

001000 

ALS 

0099 

00066 

003004 

CMA/ 

INA 

0 1 00 

00067 

072257R 

STA 

ANSCT 

0101 

00070 

036251R 

ISZ 

BUF 

0 102 

00071 

162251R 

LDA 

BUF,  : 

0 103 

00072 

072260R 

STA 

ANSAD 

0 104 

00073 

062250R 

LDA 

TYPE 

0105 

00074 

032272R 

I OR 

• 1 4 . 

0 106 

00075 

026053R 

JMP 

1.3 

GO  AND  CONFIGURE  I/O  INSTS 
SET  R/W  FLAG 

»0  TO  INDICATE  READ 
GO  GET  BUFFER  START  (LENGTH  =2) 
GET  # OF  ANSWERS 
X2  TO  GET  WDS 

MAKE  NEGATIVE 
AND  SAVE  AS  COUNTER 
INDEX  TO  NEXT  WORD 
GET  ADDRESS  OF  1ST  ANSWER  DESTI 
AND  SAVE 

GET  COMMAND  WORD 
ADD  READ  FLAG 

GO  OUTPUT  IT  AND  RETURN  TO  .IOC 
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PAGE 

0006  #01  **  BCS  DRIVER 

D . 66  * ♦ 

0 108* 

0 109* 

REJECT 

SECTION 

0 110* 

0 111 

00076 

00640 1 

REJC 

CLB, 

RSS 

RCODE  ERROR 

0112 

00077 

066273R 

REJB 

LDB 

Ml  5 

DRIVER/DEVICE  BUSY 

=(B)5IUN  = 

0113 

00  100 

002404 

CLA, 

INA 

SET  A NOT=0 

0 114 

00101 

1 26000R 

JMP 

D.66,  I 

=EXIT-  TO  .IOC 

0 115* 

0 116* 

I/O  CONFIGURATION 

! ROUTINE 

0117* 

0 118 

00102 

000000 

SETIO 

NOP 

ENTRY  4 EX  IT 

0119 

00103 

162242R 

LDA 

SAVA/ 1 

GET  WORD  1 OF  EOT 

ENTRY 

0120 

00104 

012275R 

AND 

M77 

ISOLATE  SELECT  CODE 

0 121 

00105 

032263R 

lOR 

SFSI 

COMBINE  WITH  SFS 

instruct [ ON 

0122 

00106 

072121R 

STA 

1.2 

SAVE 

0123 

00107 

022267R 

XOR 

LIAM 

MAKE  LIA  INST 

0124 

00110 

072164R 

STA 

1.6 

SAVE 

0 125 

00111 

022265R 

XOR 

CLCM 

MAKE  CLC  INST 

0 126 

00112 

072010R 

STA 

1 . 1 

SAVE 

0127 

00113 

022266R 

XOR 

STCM 

MAKE  STC  X/C  INST 

0 128 

00114 

072056R 

STA 

1.4 

0 1 29 

00115 

07221 IR 

STA 

1.8 

SAVE 

0 130 

00116 

022264R 

XOR 

OTAM 

NOW  MAKE  OTA  INST 

0131 

00117 

072053R 

STA 

1.3 

SAVE 

0132 

00120 

072202R 

STA 

1.7 

SAVE  AGAIN 

0133 

00121 

102300 

1.2 

SF3 

0 

IF  FLAG  NOT  SET  THEN 

0 134 

00122 

026077R 

JMP 

REJB 

REJECT  REQUEST 

0135* 

0136* 

SET  EOT  BUSY  FLAG 

0137* 

0 138 

00123 

036242R 

ISE 

SAVA 

SET  ADDRESS  TO  WD 

2 OF  EQT 

0 139 

00124 

162242R 

LDA 

SAVA/ I 

ENTRY,  SET  BIT  15 

OF 

0140 

00125 

032273R 

lOR 

MIS 

WD  2=1  <AFIELD=2) 

TO 

0 141 

00126 

1 72242R 

STA 

SAVA/ I 

SAY  BUSY 

0142 

00127 

062242R 

LDA 

SAVA 

SETA  ADDRESS  OI 

0 143* 

0144* 

STORE 

ADDRESS  OF 

■ EOT  WORD 

3 IN  DRIVER 

0145* 

0146 

00  130 

002004 

INA 

EQT  WORD  3 

0147 

00131 

072247R 

STA 

EQTA 

IN  EQTA 

0 148 

00132 

126102R 

JMP 

SETIO/ I 

AND  BACK  TO  CALLE 

R 
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PAGE 

0007 

#01  *♦  BCS 

DRIVER 

D»  66  * * 

0150* 

0 151  « 

RunriNE 

TO 

GET 

USERS  BUFFER 

ADDRESS  AND  SAVE  IT 

0 152* 

0153 

00133 

000000  GETBF 

NOP 

ENTRY  A EXIT 

0 154 

00  1 34 

036243R 

ISZ 

SAVB 

INDEX  TO 

0 155 

00  1 35 

036243R 

ISZ 

SAVB 

WORD  4 OF  REQUEST 

0 156 

00  136 

062243R 

LOA 

SAVB 

GET  WORD  4 

0 157 

00  1 37 

160000 

LDA 

A,  I 

OF  REQUEST 

0 158 

00140 

001275 

RAL  » 

CLE, SLA, ERA 

(IF  INDIRECT  GET 

0 159 

00  14  1 

026137R 

JMP 

*-2 

EFFECTIVE  ADDRESS) 

0160 

00  142 

072251R 

STA 

BUF 

SAVE 

0 161* 

0162* 

GET  BUFFER 

LENGTH 

0 163* 

0 164 

00143 

036243R 

I SZ 

SAVB 

INDEX  TO  WORD  5 OF  REQUEST 

0165 

00144 

162243R 

LDA 

SAVB,  I 

GET  WORD  5 

0 166 

00145 

003004 

CMA^ 

INA 

MAKE  POSITIVE 

0 167 

00  146 

006400 

CLB 

0 166 

00147 

1 76247R 

STB 

EQTA,  1 

CLEAR  OUT  XMISSION  LOG 

0169 

00150 

072252R 

STA 

LENG 

SAVE  BUFFER  LENGTH 

0 1 70 

00151 

126133R 

JMP 

GETBF, I 

-EXIT-  TO  CALLER 

;io« 


f'AG£ 

001'tS 

1 **  SC 5 DRIVEN 

0.66  » * 

d l 12* 

0 173* 

1 

.66  CONTINUATOR  SC 

:CTlON 

0 1 7H* 

0 175* 

0 176* 

0 177 

00  152 

O00000  I .66 

NOP 

;JITPY  A £xir 

0 1 78 

00153 

072244R 

STA 

3AVAX 

SAVE  A 

0 179 

00  154 

076245R 

STB 

SAV  BX 

P, 

0 180 

00  155 

00  1520 

ERA, 

ALS 

E 

0 181 

00156 

102201 

SOC 

« 

0 182 

00157 

002004 

INA 

0 183 

00  160 

072246R 

STA 

SAV  EX 

OVERFL OW 

0 134* 

0 185* 

0 186 

00161 

062254R 

LOA 

R/W 

IS  THIS  A READ 

0 187 

00  162 

0000  1 0 

SLA 

OP.  WRITE?? 

0 188 

00163 

026174R 

JMP 

-WR 

WRITE 

0 189* 

0 190* 

READ  PROCESSING 

0 191* 

0 1 92 

00  164 

102500  1.6 

l-IA 

a 

GET  ANSWER  FROM  DEVICE 

0 1 93 

00165 

003000 

CMA 

INVERT  DATA  FOR  KAL 

0 1 94 

00  1 66 

172260R 

STA 

ANSAD, 1 

DELIVER  TO  SERVICE  ROUTINE 

0 195 

00167 

036257R 

ISZ 

ANSCT 

INDEX  TO  NEXT 

0 1 96 

00  1 70 

002001 

RSS 

0197 

00171 

026222R 

JMP 

STAT 

DONE  FINISH  UP 

0198 

00172 

036260R 

1 SZ 

ANSAO 

BUOP  ADDRESS  TO  NEXT  WORD 

0 199 

00173 

026203R 

JMP 

XIT 

GO  TO  EXIT  ROUTINE 

0200* 

0 201* 

WRITE  PROCESSING 

0202* 

0203 

00174 

062255R  .WR 

LDA 

DCNT 

GET  DATA  COUNTER 

0204 

00  175 

002003 

SZA, 

RSS 

*0?? 

0205 

00  176 

02621 3R 

JMP 

. WRl 

YES  GOTO  WEIGHT  ROUTINE 

0206 

00177 

162256R 

LDA 

DADD, I 

NO,  GET  ANOTHER  DATA  POiN 

0207 

00200 

036255R 

ISZ 

DCNT 

MORE  DATA?? 

0208 

00201 

036256R 

ISZ 

DADD 

YES,  SET  FOR  NEXT  POINT 

0209 

00202 

102600  1.7 

OTA 

0 

OUTPUT  WORD 

0210* 

021  1* 

REGISTER  RESTORE  SECTION 

0212* 

0213 

0020  3 

062246R  XIT 

LDA 

SAVEX 

RESTORE 

0214 

00204 

103101 

CLO 

E 

0215 

00205 

000036 

SLA, 

ELA 

OVERFLOW 

0216 

00206 

102101 

STF 

1 

A 

0217 

00207 

062244R 

LDA 

SAV  AX 

AND  B AT 

0218 

00210 

066245R 

LDB 

SAVBX 

TIME  or  INTERRUPT 

'I 
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0220« 

022U 

EXIT  SECTION 

0222'* 

0223 

0021  1 

103700  1.8 

STC 

0>C 

0224 

00212 

126152R 

JMP 

I .66,  I 

0225 

00213 

062261R  .WRl 

LDA 

WCNT 

0226 

00214 

002003 

SZA> 

RSS 

0227 

00215 

026222R 

JMP 

STAT 

0228 

00216 

16226 2R 

LDA 

WADO, I 

0229 

00217 

036261R 

ISZ 

WCNT 

0230 

00220 

036262R 

ISZ 

WADD 

0231 

00221 

026202R 

JMP 

1.7 

START  DEVICE  FOR  NEXT  WORD  (OR 
- EXIT  - 

GET  WEIGHT  COUNTER 
»0?? 

YES  FININ 

NO!  GET  NEXT  WEIGHT 
MORE?? 

YES  SET  FOR  NEXT  TIME 
NO  GO  TO  OUTPUT  ROUTINE 
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0233« 

0234* 

STATUS  SECTION 

0235* 

0236* 

UPDATE  XMISSION  LOG 

0237* 

0 238 

00222 

062254R  STAT  LDA  R/W 

GET  READ  WRITE  FLAG 

0239 

00223 

002002 

SZA 

READ  OR  WRITE 

0240 

00224 

042271R 

ADA  B2 

ADD  2 

0 24  1 

00225 

042271R 

ADA  B2 

ADD  2 AGAIN  IF  WRITE 

0242 

0243* 

00226 

172247R 

STA  EQTA, I 

RESTORE  WORD  3 OF  EOT 

t 

0244* 

UPDATE 

STATUS  SECTION 

0245* 
0 246 

00227 

003400 

CCA 

SET  ADDRESS  FOR  WORD 

0 247 

00230 

042247R 

ADA  EQTA 

2 OF 

0248 

00231 

072251R 

STA  BUF 

EOT 

0249 

00232 

162251R 

LDA  BUF, 1 

GET  WORD  2 

0250 

00233 

012276R 

AND  MST 

REMOVE  PREVIOUS  STATUS 

0251 

00234 

17225 IR 

STA  BUF, I 

RESTORE  WORG  2 OF  EOT 

0252* 

0253* 

CLEAR 

DRIVER  BUSY  FLAG 

& EXIT 

0254* 

0255 

00235 

002400 

CLA 

CLEAR 

0256 

00236 

072253R 

STA  DFLG 

DRIVER  BUSY  FLAG 

0257 

00237 

06  2010R 

LDA  I.l 

SET  CLC 

0250 

00240 

07221 IR 

STA  1.8 

IN  EXIT  SECTION 

0259 

00241 

026203R 

JMP  XIT 

= = DONE=»  = 

I 

I 
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0 26  I * 

0 26  2# 

0263#  CONSTANT  4 STORAGE  SECTION 


0 26  a 

30000 

A 

EQU 

0 

C263 

0000  1 

B 

ECU 

1 

0 266 

00242 

000000 

SAVA 

NOP 

0 26  7 

00243 

000000 

SAVB 

NOP 

026S 

0 0 244 

000000 

SAVAX 

NOP 

0 26  9 

00245 

000000 

SAVBX 

NOP 

0270 

00246 

000000 

SAVEX 

NOP 

027  J ♦ 
0272 

00247 

000000 

EQTA 

NOP 

0273 

00250 

000000 

TYPE 

NOP 

0 276 

0025  1 

000000 

BUF 

NOP 

0275 

00252 

000000 

LENG 

NOP 

0276 

00253 

000000 

DFEG 

NOP 

0277 

00254 

000000 

R/W 

NOP 

027S 

00255 

000000 

DCNT 

NOP 

0279 

00256 

000000 

DADD 

NOP 

0230 

00257 

000000 

ANSCT 

NOP 

0281 

00260 

000000 

ANSAD 

NOP 

0232 

0026  1 

000000 

WCNT 

NOP 

0 23  3 
0284* 

00  26  2 

000000 

WADD 

NOP 

0285 

00263 

1 02300 

SFSI 

SFS 

0 

0286 

00264 

001100 

OTAil 

OCT 

1 100 

0287 

00265 

004200 

CLCM 

OCT 

4200 

0283 

00266 

005000 

3TCM 

OCT 

5000 

0 239 

00267 

000600 

LIAM 

OCT 

600 

0290* 
0 29  1 

00270 

00000 1 

B1 

OCT 

1 

0292 

00271 

0100  00  2 

B2 

OCT 

2 

0293 

00272 

040000 

. 14. 

OCT 

40000 

0294 

00273 

1 00000 

Ml  5 

OCT 

1 00000 

0295 

00274 

0000  1 7 

Ml  7 

OCT 

1 7 

0 296 

00275 

000077 

M77 

OCT 

77 

0297 

0298* 

00276 

037400 

MST 

OCT 

37400 

0299 

END 

♦*  NO  ERRORS* 
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